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SYNTHETIC NUCLEIC ACID MOLECULE COMPOSITIONS AND 
METHODS OF PREPARATION 

Statement of Government Rights 

5 The invention was made at least in part with a grant from the 

Government of the United States of America (grant DMI-9402762 from the 
National Science Foundation). The Government has certain rights to the 
invention. 

10 Background of the Invention 

Transcription, the synthesis of an RNA molecule from a sequence of 
DNA is the first step in gene expression. Sequences which regulate DNA 
transcription include promoter sequences, polyadenylation signals, transcription 
factor binding sites and enhancer elements. A promoter is a DNA sequence 

15 capable of specific initiation of transcription and consists of three general 

regions. The core promoter is the sequence where the RNA pol5nnerase and its 
CO factors bind to the DNA. Immediately upstream of the core promoter is the 
proximal promoter which contains several transcription factor binding sites that 
are responsible for the assembly of an activation complex that in turn recraits the 

20 polymerase complex. The distal promoter, located fixrther upstream of the 

proximal promoter also contains transcription factor binding sites. Transcription 
termination and polyadenylation, like transcription ioitiation, are site specific 
and encoded by defined sequences. Enhancers are regulatory regions, containing 
multiple transcription factor binding sites, that can significantly iiicrease the 

25 level of transcription from a responsive promoter regardless of the enhancer's 

orientation and distance vvdth respect to the promoter as long as the enhancer and 
promoter are located within the same DNA molecule. The amount of transcript 
produced from a gene may also be regulated by a post-transcriptional 
mechanism, the most important being RNA splicing that removes intervening 

30 sequences (ititrons) from a primary transcript between splice donor and splice 
acceptor sequences. 
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Natural selection is the hypothesis that genotype-environment 
interactions occurring at the phenotypic level lead to differential reproductive 
siiccess^of individuals and therefore to modification of the gene pool of a 
population. 

5 Some properties of nucleic acid molecules that are acted upon by natural 
selection include codon usage frequency, RNA secondary structure, the 
efficiency of intron splicing, and interactions with transcription factors or other 
nucleic acid biading proteins. Because of the degenerate nature of the genetic 
code, tixese properties can be optimized by natural selection without altering the 

10 corresponding amino acid sequence. 

Under some conditions, it is usefiil to synthetically alter the natural 
nucleotide sequence encoding a polypeptide to better adapt the polypeptide for 
alternative applications. A conmion example is to alter the codon usage 
frequency of a gene when it is expressed in a foreign host cell. Although 

1 5 redundancy in the genetic code allows amino acids to be encoded by multiple 
codons, different organisms favor some codons over others. It has been found 
that the efficiency of protein translation in a non-native host cell can be 
substantially increased by adjusting the codon usage frequency but maintaining 
the same gene product (U.S. Patent Nos. 5,096,825, 5,670,356, and 5,874,304). 

20 However, altering codon usage may, in turn, result in the unintentional 

introduction into a synthetic nucleic acid molecule of inappropriate transcription 
regulatory sequences. This may adversely effect transcription, resulting in 
anomalous expression of the synthetic DNA. Anomalous expression is defined 
as departure from normal or expected levels of expression. For example, 

25 transcription factor binding sites located downstream from a promoter have been 
demonstrated to effect promoter activity (Michael et al., 1990; Lamb et al., 1998; 
Johnson et al., 1998; Jones et al., 1997). Additionally, it is not uncommon for 
an enhancer element to exert activity and result in elevated levels of DNA 
transcription in the absence of a promoter sequence or for the presence of 

30 transcription regulatory sequences to increase the basal levels of gene expression 
in the absence of a promoter sequence. 
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Thus, what is needed is a method for making synthetic nucleic acid 
molecules with altered codon usage without also introducing inappropriate or 
_ unintended transcription regulatory sequences for-expression in a-particular- host - — - - 
cell. 

5 

Summary of the Invention 

The invention provides a syntlietic nucleic acid molecule comprising at 
least 300 nucleotides of a coding region for a polypeptide, having a codon 
composition differing at more than 25% of the codons from a wild type nucleic 

10 acid sequence encoding a polypeptide, and having at least 3-fold fewer, 

preferably at least 5-fold fewer, transcription regulatory sequences than would 
result if the differing codons were randomly selected. Preferably, the synthetic 
nucleic acid molecule encodes a polypeptide that has an amino acid sequence 
that is at least 85%, preferably 90%, and most preferably 95% or 99% identical 

15 to the amino acid sequence of the naturally-occurring (native or wild type) 

polypeptide (protein) from which it is derived. Thus, it is recognized that some 
specific amino acid changes may also be desirable to alter a particular 
phenotypic characteristic of the polypeptide encoded by the synthetic nucleic 
acid molecule. Preferably, the amino acid sequence identity is over at least 100 

20 contiguous amino acid residues. In one embodiment of the invention, the codons 
in the synthetic nucleic acid molecule that differ preferably encode the same 
amino acids as the corresponding codons in the wild type nucleic acid sequence. 

The transcription regulatory sequences which are reduced in the synthetic 
nucleic acid molecule include, but are not Umited to, any combination of 

25 transcription factor binding sequences, intron splice sites, poly(A) addition sites, 
enhancer sequences and promoter sequences. Transcription regulatory sequences 
are well known in the art. 

It is preferred that the synthetic nucleic acid molecule of the invention 
has a codon composition that differs from that of the wild type nucleic acid 

30 sequence at more than 30%, 35%, 40% or more than 45%, e.g., 50%, 55%, 60% 
or more of the codons. Preferred codons for use in the invention are those which 
are employed more frequently than at least one other codon for the same amino 
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acid in a particular organism and, more preferably, are also not low-usage 
codons in that organism and are not low-usage codons in the organism used to 
clone or screen for the expression of the synthetic nucleic acid molecule (for 
example, E, coli). Moreover, preferred codons for certain amino acids (i.e., 
5 those amino acids that have three or more codons,), may include two or more 
codons that are employed more frequently than the other (non-preferred) 
codon(s). The presence of codons in the synthetic nucleic acid molecule that are 
employed more frequently in one organism than in another organism results in a 
synthetic nucleic acid molecule which, when introduced into the cells of the 

10 organism that employs those codons rnore frequently, is expressed in those cells 
at a level that is greater than the expression of the wild type or parent nucleic 
acid sequence in those cells. For example, the synthetic nucleic acid molecule of 
the invention is expressed at a level that is at least about 1 10%, e.g., 150%, 
200%, 500% or more (1000%, 5000%, or 10000%) of that of the wild type 

15 nucleic acid sequence in a cell or cell extract under identical conditions (such as 
cell culture conditions, vector backbone, and the like). 

In one embodiment of the invention, the codons that are different are 
those employed more frequently in a manmaal, while in another embodiment the 
codons that are different are those employed more frequently in a plant. A 

20 particular type of mammal, e.g., human, may have a different set of preferred 
codons than another type of mammal. Likewise, a particular type of plant may 
have a different set of preferred codons than another type of plant. In one 
embodiment of the invention, the majority of the codons which differ are ones 
that are preferred codons in a desired host cell. Preferred codons for mammals 

25 (e.g., humans) and plants are known to the art (e.g., Wada et al., 1990). For 
example, preferred human codons include, but are not limited to, CGC (Arg), 
CTG (Leu), TCT (Ser), AGC (Ser), ACC (Thr), CCA (Pro), CCT (Pro), GCC 
(Ala), GGC (Gly), GTG (Val), ATC (lie), ATT (He), AAG (Lys), AAC (Asn), 
CAG (Gbi), CAC (His), GAG (Glu), GAC (Asp), TAG (Tyr), TGC (Cys) and 

30 TTC (Phe) (Wada et al, 1990). Thus, preferred "humanized" synthetic nucleic 
acid molecules of the invention have a codon composition which differs from a 
wild type nucleic acid sequence by having an increased nimiber of the preferred 
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human codons, e.g. CGC, CTG, TCT, AGC, ACC, CCA, CCT, GCC, GGC, 
GTG, ATC, ATT, AAG, AAC, CAG, CAC, GAG, GAC, TAG, TGC, TTC, or 
any combination thereof. For example, the synthetic nucleic acid molecule of - 
the invention may have an increased number of CTG or TTG leucine-encoding 
5 codons, GTG or GTC valine-encoding codons, GGC or GGT glycine-encoding 
codons, ATC or ATT isoleucine-encoding codons, CCA or CCT proline- 
encoding codons, CGC or CGT arginine-encoding codons, AGC or TCT serine- 
encoding codons, ACC or ACT threonine-encoding codon, GCC or GCT 
alanine-encoding codons, or any combination thereof, relative to the wild type 

10 nucleic acid sequence. Similarly, synthetic nucleic acid molecules having an 

increased number of codons that are employed more frequently in plants, have a 
codon composition which differs from a wild type or parent nucleic acid 
sequence by having an increased number of the plant codons including, but not 
limited to, CGC (Arg), CTT (Leu), TCT (Ser), TCC (Ser), ACC (Thr), CCA 

15 (Pro), CCT (Pro), GCT (Ser), GGA (Gly), GTG (Val), ATC (He), ATT (He), 

AAG (Lys), AAC (Asn), CAA (Gin), CAC (His), GAG (Glu), GAC (Asp), TAC 
(Tyr), TGC (Cys), TTC (Phe), or any combination thereof (Murray et al, 1989). 
Preferred codons may differ for different types of plants (Wada et al, 1990). 
The choice of codon may be influenced by many factors such as, for 

20 example, the desire to have an increased number of nucleotide substitutions or 
decreased number of transcription regulatory sequences. Under some 
circumstances (e.g. to permit removal of a transcription factor binding site) it 
may be desirable to replace a non-preferred codon with a codon other than a 
preferred codon or a codon oth^ than the most preferred codon. Under other 

25 circumstances, for example, to prepare codon distinct versions of a synthetic 

nucleic acid molecule, preferred codoii pairs are selected based upon the largest 
number of mismatched bases, as well as the criteria described above. 

The presence of codons in the synthetic nucleic acid molecule that are 
employed more frequently in one organism than in another organism, results in a 

30 synthetic nucleic acid molecule which, when introduced into a cell of the 

organism that employs those codons, is expressed in that cell at a level which is 
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greater than the level of expression of the wild type or parent nucleic acid 
sequence. 

A synthetic nucleic acid molecule of the invention may encode a 
selectable marker protein or a reporter molecule. However, the invention 
5 applies to any gene and is not limited to synthetic reporter genes or synthetic 
selectable marker genes. In one embodiment of a synthetic nucleic acid 
molecule of the invention that is a reporter molecule, the synthetic nucleic acid 
molecule encodes a luciferase having a codon composition different than that of 
a wild type or parent Renilla luciferase or a beetle luciferase nucleic acid 

10 sequence. A synthetic click beetle luciferase nucleic acid molecule of the 

invention may optionally encode the amino acid valine at position 224 (i.e., it 
emits green light), or may optionally encode the amino acid histidine at position 
224, histidine at position 247, isoleucine at position 346, glutamine at position 
348 or combination thereof (i.e., it emits red light). Preferred synthetic 

15 luciferase nucleic acid molecules that are related to a wild type Renilla luciferase 
nucleic acid sequence include, but are not limited to, SEQ ID NO:21 (Rlucver2) 
or SEQ ID NO:22 (Rluc-final). Preferred synthetic luciferase nucleic acid 
molecules that are related to click beetle luciferase nucleic acid sequences 
include, but are not limited to, SEQ ID NO:7 (GRverS), SEQ ID NO:8 (GR6), 

20 SEQ ID NO:9 (GRverS. 1), SEQ ID NO:14 (RDverS), SEQ ID NO:15 (RD7), 
SEQ ID NO:16 (RDverS.l), SEQ ID NO:17 (RDver5.2) or SEQ ED N0:18 
(RD156-1H9). 

The invention also provides an expression cassette. The expression 
cassette of the invention comprises a synthetic nucleic acid molecule of the 

25 invention operatively linked to a promoter that is functional in a cell. Preferred 
promoters are those functional in mammalian cells and those functional in plant 
cells. Optionally, the expression cassette may include other sequences, e.g., 
restriction enzyme recognition sequences and a Kozak sequence, and be apart of 
a larger polynucleotide molecule such as a plasmid, cosmid, artificial 

30 chromosome or vector, e.g., a viral vector. 

Also provided is a host cell comprising the synthetic nucleic acid 
molecule of the invention, an isolated polypeptide (e.g., a fusion polypeptide 
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encoded by the syntitietic nucleic acid molecule of the invention), and 
compositions and kits comprising the synthetic nucleic acid molecule of the 
invention or the polypeptide encoded titiereby in suitable container means and,- - 
optionally, instruction means. Preferred isolated polypeptides include, but are 
5 not limited to, those comprising SEQ ID N0:31 (GRverS.l), SEQ ID NO:226 
(Rluc-fmal), or SEQ ID NO:223 (RD156-1H9). 

The invention also provides a method to prepare a synthetic nucleic acid 
molecule of the invention by genetically altering a parent (either a wild type or 
another synthetic) nucleic acid sequence. The method may be used to prepare a 

10 synthetic nucleic acid molecule encoding a polypeptide comprising at least 100 
amino acids. One embodiment of the invention is directed to the preparation of 
synthetic genes encoding reporter or selectable marker proteins. The method of 
the invention may be employed to alter the codon usage frequency and decrease 
the number of transcription regulatory sequences in any open reading frame or to 

1 5 decrease the number of transcription regulatory sites in a vector backbone. 

Preferably, the codon usage frequency in the synthetic nucleic acid molecule is 
altered to reflect that of the host organism desired for expression of that nucleic 
acid molecule while also decreasing the number of potential transcription 
regulatory sequences relative to the parent nucleic acid molecule. 

20 Thus, the invention provides a method to prepare a synthetic nucleic acid 

molecule comprising an open reading frame. The method comprises altering 
(e.g., decreasing or eliminating) a plurality of transcription regulatory sequences 
in a parent (wild type or a synthetic) nucleic acid sequence that encodes a 
polypeptide having at least 100 amino acids to jdeld a synthetic nucleic acid 

25 molecule which has a decreased number of transcription regulatory sequences 
and which preferably encodes the same amino acids as the parrait nucleic acid 
molecule. The transcription regulatory sequences are selected from the group 
consisting of transcription factor binding sequences, intron splice sites, poly(A) 
addition sites, enhancer sequences and promoter sequences, and the resultmg 

30 synthetic nucleic acid molecule has at least 3 -fold fewer, preferably 5 -fold fewer, 
transcription regulatory sequences relative to the parent nucleic acid sequence. 
The method also comprises altering greater than 25% of the codons in the 
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synthetic nucleic acid sequence which has a decreased number of transcription 
regulatory sequences to yield a further synthetic nucleic acid molecule, wherein 
the codons that are altered encode the same amino acids as those in the . _ _ 
corresponding position in the synthetic nucleic acid molecule which has a 
5 decreased number of transcription regulatory sequences and/or in the parent 
nucleic acid sequence. Preferably^ the codons which are altered do not result in 
an increase in transcriptional regulatory sequences. Preferably, the further 
synthetic nucleic acid molecule encodes a polypeptide that has at least 85%, 
preferably 90%, and most preferably 95% or 99% contiguous amino acid 

10 sequence identity to the amiQO acid sequence of the polypeptide encoded by the 
parent nucleic acid sequence. 

Alternatively, the method comprises altering greater than 25% of the 
codons in a parent nucleic acid sequence which encodes a polypeptide having at 
least 100 amino acids to yield a codon-altered synthetic nucleic acid molecule, 

15 wherein the codons that are altered encode the same amiao acids as those present 
in the corresponding positions in the parent nucleic acid sequence. Then, a 
plurality of transcription regulatory sequences in the codon-altered synthetic 
nucleic acid molecule are altered to yield a further synthetic nucleic acid 
molecule. Preferably, the codons which are altered do not result in an increase in 

20 transcriptional regulatory sequences. Also, preferably, the fiirther synthetic 
nucleic acid molecule encodes a polypeptide that has at least 85%, preferably 
90%, and most preferably 95% or 99% contiguous amino acid sequence identity 
to the amino acid sequence of the polypq)tide encoded by the parent nucleic acid 
sequence. Also provided is a synthetic (including a further synthetic) nucleic 

25 acid molecule prepared by the methods of the invention. 

As described hereinbelow, the methods of the invention were employed 
with click beetle luciferase and Renilla luciferase nucleic acid sequences. While 
both of these nucleic acid molecules encode luciferase proteins, they are from 
entirely different famiUes and are widely separated evolutionarily. These 

30 proteins have unrelated amino acid sequences, protein structures, and they utilize 
dissimilar chemical substrates. The fact that they share the name "luciferase" 
should not be interpreted to mean that they are from the same family, or even 
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largely similar families. The methods produced synthetic luciferase nucleic acid 
molecules which exhibited signifiicantly enhanced levels of mammalian 

expression without negatively effecting other de^ - 

properties (including protein half-life) and which were also largely devoid of 
5 known transcription regulatory elements. 

The invention also provides at least two synthetic nucleic acid molecules 
that encode highly related polypeptides, but which synthetic nucleic acid 
molecules have an increased number of nucleotide differences relative to each 
other. These differences decrease the recombination frequency between the two 

10 synthetic nucleic acid molecules when those molecules are both present in a cell 
(i.e., they are "codon distinct" versions of a synthetic nucleic acid molecule). 
Thus, the invention provides a method for preparing at least two synthetic 
nucleic acid molecules that are codon distinct versions of a parent nucleic acid 
sequence that encodes a polypeptide. The method comprises altering a parent 

15 nucleic acid sequence to yield a first synthetic nucleic acid molecule having an 
increased number of a first plurahty of codons that are employed more 
frequently in a selected host cell relative to the number of those codons present 
in the parent nucleic acid sequence. Optionally, the first synthetic nucleic acid 
molecule also has a decreased nimiber of transcription regulatory sequences 

20 relative to the parent nucleic acid sequence. The parent nucleic acid sequence is 
also altered to yield a second synthetic nucleic acid molecule having an 
increased number of a second plurality of codons that are employed more 
frequently in the host ceU relative to the number of those codons in the parent 
nucleic acid sequence, wherein the first plurality, of codons is different than the 

25 second plurality of codons, and wherein the first and the second synthetic nucleic 
acid molecules preferably encode the same polypeptide. Optionally, the second 
S3mthetic nucleic acid molecule has a decreased number of transcription 
regulatory sequences relative to the parent nucleic acid sequence. Either or both 
synthetic molecules can then be fiarther modified. 

30 Clearly, the present invention has applications with many genes and 

across many fields of science including, but not limited to, life science research. 
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agrigenetics, genetic therapy, developmental science and pharmaceutical 
development. 

Brief Description of the Figures 

5 Figure 1. Codons and their corresponding amino acids. 

Figure 2. A nucleotide sequence comparison of a yellow-green (YG) 
click beetle luciferase nucleic acid sequence (YG #81-6G01; SEQ ID N0:2) and 
various synthetic green (OR) click beetle luciferase nucleic acid sequences 
(GRverl, SEQ ID NO:3; GRver2, SEQ ED NO:4; GRverS, SEQ ID NO:5; 

10 GRver4, SEQ ID NO:6; GRverS, SEQ ID NO:7; GR6, SEQ ID NO:8; GRverS.l, 
SEQ ID NO:9) and various red (RD) click beetle luciferase nucleic acid 
sequences (RDverl, SEQ ID NO:10; RDver2, SEQ ID NO:ll; RDverS, SEQ ID 
NO: 12; RDver4, SEQ ID NO: 1 3; RDverS, SEQ ID NO: 14; RD7, SEQ ID 
N0:15; RDverS.l, SEQ ID NO:16; RDver5.2, SEQ ID NO: 17; RD156-1H9, 

15 SEQ ID NO:18). The nucleotides enclosed in boxes are nucleotides that differ 
firom the nucleotide present at the homologous position in SEQ ID NO:2. 

Figure 3. An amino acid sequence comparison of a YG click beetle 
luciferase amino acid sequence (YG#81-6G01, SEQ ID NO:24) and various 
synthetic GR click beetle luciferase amino acid sequences (GRverl, SEQ ID 

20 NO:25; GRver2, SEQ ID NO:26; GRverS, SEQ ID NO:27; GRver4, SEQ ID 
NO:28; GRverS, SEQ ID NO:29; GR6, SEQ ID NO:30; GRverS.l, SEQ ID 
NO:31) and various red (RD) cUck beetle luciferase auGiino acid sequences 
(RDverl, SEQ ID NO:32; RDver2, SEQ ID NO:33; RDver3, SEQ ID NO:34; 
RDver4, SEQ ID NO:218; RDverS, SEQ ID NO:219; RD7,-SEQ ID NO:220; 

25 RDverS.l, SEQ ID NO:221; RDverS.2, SEQ ID NO:222; RD156-1H9, SEQ ID 
NO:223). All amino acid sequences are inferred firom the corresponding 
nucleotide sequence. The amino acids enclosed in boxes are amino acids that 
differ from the amino acid present at the homologous position in SEQ ID NO;24. 
Figure 4.. Codon usage in YG#81-6G01, GRverl, RDverl, GRverS, and 

30 RDverS, and humans (HUM) and relative codon usage in YG#8 1-6G01, GRverS, 
RDverS, and humans. 
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Figure 5. Codon usage summaries for YG#81-6G01 (Figxire 5 A), and 
GSJKD synthetic nucleic acid sequences, GRverl (Figure 5B), RDverl (Figure 
_ 5C), GRyer2 (Figure 5D.), RDyer2 (Figure 5E), GRver3. (Figure 5F),-KDver3 
(Figure 5G), GRver4 (Figure 5H), RDver4 (Figure 51), GRverS (Figure 51), 
5 RDverS (5K). 

Figure 6. Oligonucleotides employed to prepare synthetic GR/RD 
luciferase genes (SEQ ID Nos. 35-245). 

Figure 7. A nucleotide sequence comparison of a wild type Renilla 
reniformis luciferase nucleic acid sequence Genbank Accession No. M63501 
10 (RELLUC, SEQ ID NO: 19) and various synthetic Renilla luciferase nucleic acid 
sequences (Rlucverl, SEQ ID NO:20; Rlucver2, SEQ ID NO:21; Rluc-final, 
SEQ ID NO:22). The nucleotides enclosed in boxes are nucleotides that differ 
firom the nucleotide present at the homologous position in SEQ ID NO:19. 

Figure 8. An amino acid sequence comparison of a wild type Renilla 
15 reniformis luciferase amino acid sequence (RELLUC, SEQ ID NO: 224) and 

various synthetic Renilla reniformis luciferase amino acid sequences (Rlucverl, 
SEQ ID NO:225; Rlucver2, SEQ ID NO:226; Rluc-final, SEQ ID NO:227). All 
amino acid sequences are inferred from the corresponding nucleotide sequence. 
The amino acids enclosed in boxes are amino acids that differ from the amino 
20 acid present at the homologous position in SEQ ID NO:224. 

Figure 9. Codon usage in wild-type (A) versus synthetic (B) Renilla 
luciferase genes. For codon usage in selected organisms, see, e.g., Wada et al., 
1990; Sharp et al., 1988; Aota et al., 1988; and Sharp et al., 1987, and for plant 
codons, Murray et al. 1989. 
25 Figure 10. Oligonucleotides employed to prepare synthetic Renilla 

luciferase gene (SEQ ID Nos. 246-292). 

Figure 11. A nucleotide sequence comparison of a wild tjq^e yellow- 
green (YG) click beetle luciferase nucleic acid sequence (LUCPPLYG, SEQ ID 
NO:l) and the synthetic green click beetle luciferase nucleic acid sequences 
30 (GRverS . 1 , SEQ ID NO:9) and the synthetic red chck beetle luciferase nucleic 
acid sequences (RD156-1H9, SEQ ID NO: 18). The nucleotides enclosed in 
boxes are nucleotides that differ from the nucleotide present at the homologous 
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position in SEQ ID NO:l. Both synthetic sequences have a codon composition 
that differs from LUCPPLYG at more than 25% of the codons and have at least 
'3-foid fewerjxanscription regulatory sequences relative to a random selection of 
codons at the codons which dififer. 
5 Figure 12. An amino acid sequence comparison of a wild type YG click 

beetle liaciferase amino acid sequence (LUCPPLYG, SEQ ID NO:23) and the 
synthetic GR click beetle luciferase amino acid sequences (GRverS.l, SEQ ID 
NO:31) and the red (RD) click beetle luciferase amino acid sequences (RD156- 
1H9, SEQ ID NO:223). All amino acid sequences are inferred from the 

10 corresponding nucleotide sequence. The amino acids enclosed in boxes are 

amino acids that differ from the amino acid present at the homologous position 
inSEQIDNO:23. 

Figure 13. pRL vector series. All of the vectors contain the Renilla wild 
type or synthetic gene as ftirther described herein. Figure 13 A illustrates the 

1 5 Renilla luciferase gene in the pGL3 vectors (Promega Coip,) Figiire 13B 
illustrates the Renilla luciferase co-reporter vector series. pRL-TK has the 
heipes simplex virus (HSV) tk promoter; pRL-S V40 has the S V40 virus early 
enhancer/promoter; pRL-CMV has the cytomegalovims (CMV) enhancer and 
immediate early promoter; pRL-nuU has MCS (multiple cloning sites) but no 

20 promoter or enhancer; pRL-TK(Int has HS V/tk promoter without an intron that 
is present in. the other plasmids; pR-GL3B has the pGL-3 Basic backbone 
(Promega Coip.); pR-GL3 TK has the pGL3-Basic backbone with an HSV tk 
promoter. 

Figure 14. Half-life of synthetic (Rluc-final) and native Renilla 
25 luciferases in CHO cells. 

Figures 15A-B. /n vz^o transcription/translation of jRenz7/a luciferase 
nucleic acid sequences. A) t = 0-60 minutes; B) linear range. 

Figures 15C-D, In vitf'o translation of native and synthetic (Rluc-final) 
Renilla luciferase RNAs in a rabbit reticulocyte lysate. RNA was quantitated 
30 . and the same amoimt was employed as in the translation reaction shown in 
Figures 15A-B. C) t = 0-60 minutes; D) linear range. 
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Figures 15E-F. Translation of native and synthetic (Rluc-J&nal) Renilla 
RNAs in a wheat germ extract. E) t = 0-60 minutes; F) linear range. 

Figure 16. High-expression from a.synthetic Renilla nucleic acid 
sequence reduces the risk of promoter interference in a co-transfection assay. 
5 CHO cells were co-transfected with a constant amount (50 ng) of firefly 
luciferase expression vector (pGL3 control vector, with S V40 promoter and 
enhancer; Luc-f) and a pRL vector having a native (0 ng, 50 ng, 100 ng, 500 ng, 
1 ^ig or 2 ^ig) or synthetic (0 ng, 5 ng, 10 ng, 50 ng, 100 ng or 200 ng) Renilla 
luciferase gene. 

10 Figures 17A-B. Illustrates the reactions catalyzed by firefly and chck 

beetle (17A), and Renilla (17B) luciferases. 

Figure 18. Nucleotide and inferred amino acid sequence of click beetle 
luciferases in pGL3 vectors (GRver5.1 in pGL3, SEQ ID NO:297 encoding SEQ 
ID NO:298; RDverS.l in pGLS, SEQ ID NO:299 encoding SEQ ID NO:300; and 

15 RD156-.1H9 in pGL3, SEQ ID NO:301 encoding SEQ ID NO:302). To clone 
GRver5.1, RDver5.1, and RD156-1H9 nucleic acid sequences into pGL3 
vectors, an oHgonucleotide having an Nco I site at the initiation codon was 
employed, which resulted in an amino acid substitution at position 2 to vahne. 

20 Detailed Description of the Invention 

Definitions 

The term "gene" as used herein, refers to a DNA sequence that comprises 
coding sequences necessary for the production of a polypeptide or protein 
precursor. The polypeptide can be encoded by a fall length coding sequence or 
25 by any portion of the coding sequence, as long as the desired protein activity is 
retained. 

A "nucleic acid", as used herein, is a covalently linked sequence of 
nucleotides in which the 3' position of the pentose of one nucleotide is joined by 
a phosphodiester group to the 5' position of the pentose of the next, and in which 
30 the nucleotide residues (bases) are linked in specific sequence, i.e., a Unear order 
of nucleotides. A "polynucleotide", as used herein, is a nucleic acid containing a 
sequence that is greater than about 100 nucleotides in length. An 
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"oligonucleotide*', as used herein, is a short polynucleotide or a portion of a 
polynucleotide. An oligonucleotide typically contains a sequence of about two 

- - to about one hundred bases. -The vrord "oUgoJ4s_someti^ 

word "oligonucleotide". 
5 Nucleic acid molecules are said to have a "S'-tenninus" (5' end) and a 

"3 '-terminus" (3' end) because nucleic acid phosphodiester linkages occur to the 
5' carbon and 3' carbon of the pentose ring of the substituent mononucleotides. 
The end of a polynucleotide at which a new linkage would be to a 5' carbon is its 
5' terminal nucleotide. The end of a polynucleotide at which a new linkage 

10 would be to a 3' carbon is its 3' terminal nucleotide. A terminal nucleotide, as 
used herein, is the nucleotide at the end position of the 3'- or 5'-terminus. 

DNA molecules are said to have "5' ends" and "3' ends" because 
mononucleotides are reacted to make oligonucleotides in a mamier such that the 
5' phosphate of one mononucleotide pentose ring is attached to the 3' oxygen of 

15 its neighbor in one direction via a phosphodiester linkage. Therefore, an end of 
an oligonucleotides referred to as the "5' end" if its 5' phosphate is not linked to 
the 3' oxygen of a mononucleotide pentose ring and as the "3' end" if its 3' 
oxygen is not linked to a 5' phosphate of a subsequent mononucleotide pentose 
ring. 

20 As used herein, a nucleic acid sequence, even if internal to a larger 

oligonucleotide or polynucleotide, also may be said to have 5' and 3' ends. In 
either a linear or circular DNA molecule, discrete elements are referred to as 
being "upstream" or 5' of the "downstream" or 3' elements. This terminology 
reflects the fact that transcription proceeds in a 5' to 3' fashion along the DNA 

25 strand. Typically, promoter and enhancer elements that direct transcription of a 
linked gene are generally located 5' or upstream of the coding region. However, 
enhancer elements can exert their effect even when located 3' of the promoter 
element and the coding region. Transcription termination and polyadenylation 
signals are located 3' or downstream of the coding region. 

30 The term "codon" as used herein, is a basic genetic coding unit, 

consisting of a sequence of three nucleotides that specify a particular amino acid 
to be incorporation into a polypeptide chain, or a start or stop signal. Figure 1 
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contains a codon table. The term "coding region" when used in reference to 
structural gene refers to the nucleotide sequences that encode the andno acids 
found in the nascent polypeptide as a result- of translation of a mKNA molecule.. 
Typically, the coding region is boimded on the 5' side by the nucleotide triplet 
5 "ATG" which encodes the initiator methionine and on the 3' side by a stop codon 
(e.g., TAA, TAG, TGA). In some cases the coding region is also known to 
initiate by a nucleotide triplet "TTG". 

By "protein" and "polypeptide" is meaat any chain of amino acids, 
regardless of length or post-translational modification (e.g., glycosylation or 

10 phosphorylation). The synthetic genes of the invention may also encode a 
variant of a naturally-occurring protein or polypeptide fiagment thereof. 
Preferably, such a protein polypeptide has an amino acid sequence that is at least 
85%, preferably 90%, and most preferably 95% or 99% identical to the amino 
acid sequence of the naturally-occurring (native) protein firom which it is 

15 derived. 

Polypeptide molecules are said to have an "amino terminus" 
(N-terminus) and a "carboxy terminus" (C-terminus) because peptide linkages 
occur between the backbone amino group of a first amino acid residue and the 
backbone carboxyl group of a second amino acid residue. The terms 

20 "N-tenninal" and "C-terminal" in reference to polypeptide sequences refer to 
regions of polypeptides including portions of the N-terminal and C-terminal 
regions of the polypeptide, respectively. A sequence that includes a portion of 
the N-terminal region of polypeptide includes amino acids predominantly jfrom 
the N-terminal half of the polypeptide chain, but is not limited to such 

25 sequences. For example, an N-terminal sequence may include an interior portion 
of the polypeptide sequence including bases firom both the N-terminal and 
C-terminal halves of the polypeptide. The same applies to C-terminal regions. 
N-terminal and C-terminal regions may, but need not, include the amino acid 
defining the ultimate N-terminus and C-terminus of the polypeptide, 

30 respectively. 

The term "wild type" as used herein, refers to a gene or gene product that 
has the characteristics of that gene or gene product isolated firom a naturally 
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occuiring source. A wild type gene is that which is most frequently observed in 
a population and is thus arbitrarily designated the "wild type" form of the gene, 
- - - In contrast, the-term "mutant'l_ refers to. a gene or gene product that displays _ 
modifications in sequence and/or functional properties (i.e., altered 
5 characteristics) when compared to the wild type gene or gene product. It is noted 
that naturally-occurring mutants can be isolated; these are identified by the fact 
that they have altered characteristics when compared to the wild type gene or 
gene product. 

The terms "complementary" or "complementarity" are used in reference 

10 to a sequence of nucleotides related by the base-pairing rules. For example, for 
the sequence 5' "A-G-T" 3', is complementary to the sequence 3' "T-C-A" 5'. 
Complementarity may be "partial," in which only some of the nucleic acids' 
bases are matched according to the base pairing rules. Or, there may be 
"complete" or "total" complementarity between the nucleic acids. The degree of 

1 5 complementarity between nucleic acid strands has significant effects on the 

efficiency and strength of hybridization between nucleic acid strands. This.is of 
particular unportance in amplification reactions, as well as detection methods 
which depend upon hybridization of nucleic acids. 

The term "recombinant protein" or "recombinant polypeptide" as used 

20 herein refers to a protein molecule expressed from a recombinant DNA 

molecule. In contrast, the term "native protein" is used herein to indicate a 
protein isolated from a naturally occurring (i.e., a nonrecombinant) source. 
Molecular biological techniques may be used to produce a recombinant form of a 
protein with identical properties as compared to the native form of the protein. 

25 The terms "fiision protein" and "fusion partner" refer to a chimeric 

protein containing the protein of interest (e.g., luciferase) joined to an exogenous 
protein fragment (e.g., a fusion partner which consists of a non-luciferase 
protein). The fusion partner may enhance the solubility of protein as expressed 
in a host cell, may, for example, provide an affinity tag to allow purification of 

30 the recombinant fusion protein from the host cell or culture supernatant, or both. 
If desired, the fusion partner may be removed from the protein of interest by a 
variety of errzymatic or chemical means known to the art. 
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The terms "cell," "cell line,*' "host cell," as used herein, are used 
interchangeably, and all such designations include progeny or potential progeny 
of these designations. By *%ansfonned cell!Vis meant a ceU into which^or into 
an ancestor of which) has been introduced a DNA molecule comprising a 
5 synthetic gene. Optionally, a synthetic gene of the invention may be introduced 
into a suitable cell line so as to create a stably-transfected cell line capable of 
producing tlie protein or polypeptide encoded by the synthetic gene. Vectors , 
cells, and methods for constructing such cell lines are well known in the art, e.g. 
in Ausubel, et al. (infra). The words "transformants" or "transformed cells" 

1 0 include the primary transformed cells derived from the originally transformed 
cell without regard to the number of transfers. All progeny may not be precisely 
identical in DNA content, due to deliberate or inadvertent mutations. 
Nonetheless, mutant progeny that have the same functionality as screened for in 
the originally transformed cell are included in the definition of transformants. 

15 Nucleic acids are known to contain different types of mutations. A 

"point" mutation refers to an alteration in the sequence of a nucleotide at a single 
base position from the wild type sequence. Mutations may also refer to insertion 
or deletion of one or more bases, so that the nucleic acid sequence differs from 
the wild-type sequence. 

20 • The term "homology" refers to a degree of complementarity. There may 

be partial homology or complete homology (i.e., identity). Homology is often 
measured using sequence analysis software (e.g.. Sequence Analysis Software 
Package of the Genetics Computer Group. University of Wisconsin 
Biotechnology Center. 1710 University Avenue. Madison, WI 53705). Such 

25 software matches similar sequences by assigning degrees of homology to various 
substitutions, deletions, insertions, and other modifications. Conservative 
substitutions typically include substitutions within the following groups: 
glycine, alanine; valine^ isoleucine, leucine; aspartic acid, glutamic acid, 
asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, 

30 tyrosine. 

A "partially complementary" sequence is one that at least partially 
inhibits a completely complementary sequence from hybridizing to a target 
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nucleic acid is referred to using the functional term "substantially homologous." 
The inhibition of hybridization of the completely complementary sequence to the 
- target sequence may-be exarmned using a hybridization assay (Sputhem or 
Northern blot, solution hybridization and the like) under conditions of low 
5 stringency. A substantially homologous sequence or probe will compete for and 
inhibit the binding (i.e., the hybridization) of a completely homologous to a 
target under conditions of low stringency. This is not to say that conditions of 
low stringency are such that non-specific binding is permitted; low stringency 
conditions require that the binding of two sequences to one another be a specific 

10 (i.e., selective) interaction. The absence of non-specific binding may be tested 
by the use of a second target which lacks even a partial degree of 
complementarity (e.g., less thaa about 30% identity). In this ease, in the absence 
of non-specific binding, the probe will not hybridize to the second 
non-complementary target. 

1 5 When used in reference to a double-stranded nucleic acid sequence such 

as a cDNA or a genomic clone, the term "substantially homologous" refers to 
any probe which can hybridize to either or both strands of the double-stranded 
nucleic acid sequence xmder conditions of low stringency as described herein. 
"Probe" refers to an oligonucleotide designed to be sufficiently 

20 complementary to a sequence in a denatured nucleic acid to be probed (in 
relation to its length) to be boiuid under selected stringency conditions. 

"Hybridization" and "binding" in the context of probes and denature 
melted nucleic acid are used interchangeably. Probes which are hybridized or 
bound to denatured nucleic acid are base paired to complementary sequences in 

25 the polynucleotide. Whether or not a particular probe remains base paired with 
the polynucleotide depends on the degree of complementarity, the length of the 
probe, and the stringency of the binding conditions. The higher tjie stringency, 
the higher must be the degree of complementarity and/or the longer the probe. 
The term "hybridization" is used in reference to the pairing of 

30 complementary nucleic acid strands. Hybridization and the strength of 

hybridization (i.e., the strength of the association between nucleic acid strands) is 
impacted by many factors well known in the art including the degree of 
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complementarity between the nucleic acids, stringency of the conditions 
involved affected by such conditions as the concentration of salts, the Tm 
"(melting tenlperature) of the formed hybrid,- the presence of other components 
(e.g., the presence or absence of polyethylene glycol), the molarity of the 
5 hybridizing strands and the G:C content of the nucleic acid strands. 

The terai "stringency" is used in reference to the conditions of 
temperature, ionic strength, and the presence of other compounds, imder which 
nucleic acid hybridizations are conducted. With "high stringency" conditions, 
nucleic acid base pairing will occur only between nucleic acid fragments that 

10 have a high frequency of complementary base sequences. Thus, conditions of 
"medium" or "low" stringency are often required when it is desired that nucleic 
acids which are not completely complementary to one another be hybridized or 
annealed together. The art knows well that numerous equivalent conditions can 
be employed to comprise medium or low stringency conditions. The choice of 

15 hybridization conditions is generally evident to one skilled in the art and is 
usually guided by the purpose of the hybridization, the type of hybridization 
(DNA-DNA or DNA-RNA), and the level of desired relatedness between the 
sequences (e.g., Sambrook et al., 1989; Nucleic Acid Hybridization, A Practical 
Approach, IRL Press, Washington D.C., 1985, for a general discussion of the 

20 methods). 

The stability of nucleic acid duplexes is known to decrease with an 
increased number of mismatched bases, and further to be decreased to a greater 
or lesser degree depending on the relative positions of mismatches in the hybrid 
duplexes. Thus, the stringency of hybridization can be used to maximize or 

25 riiininiize stabiUty of such duplexes. Hybridization stringency can be altered by: 
adjusting the temperature of hybridization; adjusting the percentage of helix 
destabilizing agents, such as formamide, in the hybridization mix; and adjusting 
the temperature and/or salt concentration of the wash solutions. For filter 
hybridizations, the final stringency of hybridizations often is determined by the 

30 salt concentration and/or temperature used for the post-hybridization washes. 

"High stringency conditions" when used in reference to nucleic acid 
hybridization comprise conditions equivalent to binding or hybridization at 42*^0 
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in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH2P04 H2O and 
1 .85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5X Denhardt's 
reagent and l_Op |jig/nil_denatured sataton^sperm DN followed by washing in a 
solution comprising O.IX SSPE, 1.0% SDS at 42''C when a probe of about 500 
5 nucleotides in length is employed. 

"Medium stringency conditions" when used in reference to nucleic acid 
hybridization comprise conditions equivalent to binding or hybridization at 42*^0 
in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH2P04 H2O and 
1.85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5X Denhardt's 

10 reagent and 100 |ag/ml denatured salmon sperm DNA followed by washing in a 
solution comprising LOX SSPE, 1.0% SDS at 42*^0 when a probe of about 500 
nucleotides in length is employed. 

"Low stringency conditions" comprise conditions equivalent to binding 
or hybridization at 42''C in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 

15 g/1 NaH2P04 H2O and 1.85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.1% 
SDS, 5X Denhardt's reagent [50X Denhardt's contains per 500 ml: 5 g Ficoll 
(Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)] and 100 g/ml denatured 
salmon sperm DNA followed by washing in a solution comprising 5X SSPE, 
0.1% SDS at 42°G when a probe of about 500 nucleotides in length is employed. 

20 The term "Tm" is used in reference to the "melting temperature". The 

melting temperature is the temperature at which 50% of a population of 
double-stranded nucleic acid molecules becomes dissociated into single strands. 
The equation for calculating the Tm of nucleic acids is well-known in the art. 
The Tm of a hybrid nucleic acid is often estimated using a formula adopted from 

25 hybridization assays in 1 M salt, and commonly used for calculating Tm for PGR 
primers: [(number of A + T) x 2°C + (number of G+C) x 4°G]. (C.R. Newton et 
al., PGR. 2nd Ed., Springer- Verlag (New York, 1997), p. 24). This formula was 
found to be inaccurate for primers longer than 20 nucleotides. (Id.) Another 
simple estimate of the Tm value may be calculated by the equation:* T^ = 81.5 + 

30 0.41(% G + G), when a nucleic acid is in aqueous solution at 1 M NaGl. (e.g., 
Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid 
Hybridization ^ 1985). Other more sophisticated computations exist in ttie art 
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which take structural as well as sequence characteristics into account for the 
calculation of Tm- A calculated Tm is merely an estimate; the optimum 

temperature_is cpmnaonly detenmned empirically. . _ 

The term "isolated" when used in relation to a nucleic acid, as in "isolated 
5 oUgonucleotide" or "isolated polynucleotide" refers to a nucleic acid sequence 
that is identified and separated firom at least one contaminant with which it is 
ordinarily associated in its source. Thus, an isolated nucleic acid is present in a 
form, or setting that is different from that in which it is foimd in nature. In 
contrast, non-isolated nucleic acids (e.g., DNA and RNA) are found in the state 

10 they exist in nature. For example, a given DNA sequence (e.g., a gene) is found 
on the host cell chromosome in proximity to neighboring genes; RNA sequences 
(e.g., a specific mRNA sequence encoding a specific protein), are found in the 
cell as a mixture with numerous other mRNAs that encode a multitude of 
proteins. However, isolated nucleic acid includes, by way of example, such 

15 nucleic acid in cells ordinarily expressing that nucleic acid where the nucleic 
acid is in a chromosomal location different fi-om that of natural cells, or is 
otherwise flanked by a different nucleic acid sequence than that found in nature. 
The isolated nucleic acid or oligonucleotide may be present in single-stranded or 
double-stranded form. When an isolated nucleic acid or oligonucleotide is to be 

20 utilized to express a protein, the oligonucleotide contains at a minimum, the 
sense or coding strand (i.e., the oligonucleotide may single-stranded), but may 
contain both the sense and anti-sense strands (i.e., the oligonucleotide may be 
double-stranded). 

The term "isolated" when used in relation to a polypeptide, as in "isolated 
25 protein" or "isolated polypeptide" refers to a polypeptide that is identified and 
separated firom at least one contaminant with which it is ordinarily associated in 
its source. Thus, an isolated polypeptide is present in a form or setting that is 
different firom that in which it is found in nature. In contrast, non-isolated 
polypeptides (e.g., proteins and enzymes) are found in the state they exist in 
30 nature. 

The term "purified" or "to purify" means the result of any process that 
removes some of a contaminant from the component of interest, such as a protein 
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or nucleic acid. The percent of a purified component is thereby increased in the 
sample. 

The term "operably linked" as used herein refer to the linkage of nucleic^ _ 
acid sequences in such a manner that a nucleic acid molecule capable of 
5 directing the transcription of a given gene and/or the synthesis of a desired 
protein molecule is produced. The term also refers to the linkage of sequences 
encoding amino acids in such a maimer that a functional (e.g., enzymatically 
active, capable of binding to a binding partner, capable of inhibiting, etc.) protein 
or polypeptide is produced. 

10 The term "recombinant DNA molecule" means a hybrid DNA sequence 

comprising at least two nucleotide sequences not normally fovmd together in 
nature. The term "vector" is used in reference to nucleic acid molecules 

into which firagments of DNA may be inserted or cloned and can be used to 
transfer DNA segment(s) into a cell and capable of repUcation in a cell. Vectors 

15 may be derived from plasmids, bacteriophages, viruses, cosmids, and the Uke. 

The terms "recombinant vector" and "expression vector" as used herein 
refer to DNA or RNx\ sequences containing a desired coding sequence and 
appropriate DNA or RNA sequences necessary for the expression of the operably 
linked coding sequence in a particular host organism, Prokaryotic expression 

20 vectors include a promoter, a ribosome binding site, an origin of replication for 
autonomous replication in a host cell and possibly other sequences, e.g. an 
optional operator sequence, optional restriction enzyme sites. A promoter is 
defined as a DNA sequence that directs RNA polymerase to bind to DNA and to 
initiate RNA synthesis. Eukaryotic expression vectors include a promoter, 

25 optionally a polyadenlyation signal and optionally an eiihancer sequence. 

The term "a polynucleotide having a nucleotide sequence encoding a 
gene," means a nucleic acid sequence comprising the coding region of a gene, or 
in other words the. nucleic acid sequence which encodes a gene product. The 
coding region may be present in either a cDNA, genomic DNA or RNA form. 

30 When present in a DNA form, the oligonucleotide may be single-stranded (i.e., 
the sense strand) or double-stranded. Suitable control elements such as 
enhancers/promoters, splice junctions, polyadenylation signals, etc. may be 
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placed in close proximity to the coding region of the gene if needed to permit 
proper initiation of transcription and/or correct processing of the primary RNA 
transcript. Alternatiyely, the coding re^^ . _ _ _ 

the present invention may contain endogenous enhancers/promoters, splice 
5 junctions, intervening sequences, polyadenylation signals, etc. In further 

embodiments, the coding region may contain a combination of both endogenous 
and exogenous control elements. 

The term "transcription regulatory element" or "transcription regulatory 
sequence" refers to a genetic element or sequence that controls some aspect of 

10 the expression of nucleic acid sequence(s). For example, a promoter is a 

regulatory element that facilitates the initiation of transcription of an operably 
linked coding region. Other regulatory elements include, but are not limited to, 
transcription factor binding sites, splicing signals, polyadenylation signals, 
termination signals and enhancer elements. 

15 Transcriptional control signals in eukaryotes comprise "promoter" and 

"enhancer" elements. Promoters and enhancers consist of short arrays of DNA 
sequences that interact specifically with cellular proteins involved in 
transcription (Maniatis et al., 1987). Promoter and enhancer elements have been 
isolated from a variety of eukaryotic sources including genes in yeast, insect and 

20 mammalian cells. Promoter and enhancer elements have also been isolated from 
viruses and analogous control elements, such as promoters, are also found in 
prokaryotes. The selection of a particular promoter and enhancer depends on the 
cell type used to express the protein of interest. Some eukaryotic promoters and 
enhancers have a broad host range while otiters are functional in a limited subset 

25 of cell types (for review, see Voss et al., 1986; and Maniatis et al., 1987. For 
example, the SV4G early gene enhancer is very active in a wide variety of cell 
types from many mammalian species and has been widely used for the 
expression of proteins in mammahan cells (Dijkema et al., 1985). Two other 
examples of promoter/enhancer elements active in a broad range of mammaUan 

30 cell tj^es are those from the human elongatioii factor. 1 gene (Uetsuki et al., 

1989; Kixii, et al., 1990; and Mizushima and Nagata, 1990) and the long terminal 
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repeats of the Rous sarcoma virus (Gorman et aL, 1982); and the human 
cytomegalovirus (Boshart et al., 1985). 

T^eJEenn^lpromoter/eohanc^ 

sequences capable of providing botli promoter and enhancer functions (i.e., the 
5 functions provided by a promoter element and an enhancer element as described 
above). For example, the long terminal repeats of retroviruses contain both 
promoter and enhancer functions. The enhancer/promoter may be "endogenous" 
or "exogenous" or "heterologous." An "endogenous" enhancer/promoter is one 
that is naturally linked with a given gene in the genome. An "exogenous" or 

10 "heterologous" enhancer/promoter is one that is placed in juxtaposition to a gene 
by means of genetic manipulation (i.e., molecular biological techniques) such 
that transcription of the gene is directed by the liiaked enhancer/promoter. 

The presence of "spUcing signals" on an expression vector often results in 
higher levels of expression of the recombinant transcript in eukaryotic host cells. 

1 5 Splicing signals mediate the removal of introns from the primary RNA 

transcript and consist of a sphce donor and acceptor site (Sambrook, et al., . 
Molecular Cloning: A Laboratory Manual, 2nd ed.. Cold Spring Harbor 
LaboratoryPress, New York, 1989, pp. 16.7-16.8). A commonly used sphce 
donor and acceptor site is the splice jimction from the 16S RNA of SV40. 

20 Efficient expression of recombinant DNA sequences in eukaryotic cells 

requires expression of signals directing the efficient termination and 
polyadenylation of the resultiug transcript. Transcription termination signals are 
generally found downstream of the polyadenylation signal and are a few hundred 
nucleotides in length. The term "poly(A) site" or "poly(A) sequence" as used 

25 herein denotes a DNA sequence which directs both the termination and 

polyadenylation of the nascent RNA transcript. Efficient polyadenylation of the 
recombinant transcript is desirable, as transcripts lacking a poly(A) tail are - 
unstable and are rapidly degraded. The poly(A) signal utilized in an expression 
vector may be "heterologous" or "endogenous." An endogenous poly(A) signal 

30 is one that is found naturally at the 3' end of the coding region of a given gene in 
the genome. A heterologous poly(A) signal is one which has been isolated from 
one gene and positioned 3' to another gene. A commonly used heterologous 



wo 



02/16944 



25 



PCT/USOl/26566 



poly(A) signal is the SV40 poly(A) signal. The SV40 poly(A) signal is 
contained on a 237 bp BamH VBcl I restriction jfragment and directs both 
_ termination md polyadenylation (Sam _ _ 

Eukaryotic expression vectors may also contain "viral replicons "or "viral 
5 origins of replication." Viral replicons are viral DNA sequences which allow for 
the extrachromosomal repUcation of a vector in a host cell expressing the 
appropriate replication factors. Vectors containing either the SV40 or polyoma 
virus origia of replication rephcate to high copy number (up to 10"^ copies/cell) 
in cells that express the appropriate viral T antigen. In contrast, vectors 
10 containing the replicons from bovine papillomavirus or Epstein-Barr vims 
replicate extrachromosomaily at low copy number (about 100 copies/cell). 

The term "m vz/ro" refers to an artificial enviroimient and to processes or 
reactions that occur within an artificial environment. In vitro environments 
include, but are not limited to, test tubes and cell lysates. The term "zw situ"^ 
1 5 refers to cell culture. The term "m vzvo" refers to the natural environment (e.g., 
an animal or a cell) and to processes or reaction that occur within a natural 
environment. 

The term "expression system" refers to any assay or system for 
determining (e.g., detecting) the expression of a gene of interest. Those skilled 

20 in the field of molecular biology will understand that any of a wide variety of 
expression systems may be used. A wide range of suitable mammahan cells are 
available firom a wide range of source (e.g., the American Type Culture 
Collection, Rockland, MD). The method of transformation or transfection and 
the choice of expression vehicle will depend on the host system selected. 

.25 Transformation and transfection metliods are described, e.g., in Ausubel, et al.. 
Current Protocols in Molecular Biology. John Wiley & Sons, New York. 1992. 
Expression systems include in vitro gene expression assays where a gene of 
interest (e.g., a reporter gene) is linked to a regulatory sequence and the 
expression of the gene is monitored following treatment with an agent that 

30 inhibits or induces expression of the gene. Detection of gene expression can be 
through any suitable means including, but not limited to, detection of expressed 
mRNA or protein (e.g., a detectable product of a reporter gene) or through a 
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detectable change in the phenotype of a cell expressmg the gene of interest. 
Expression systems may also comprise assays where a cleavage event or other 
nu cleic acid or cellu lar change is detected. 



The term "en2yme" refers to molecules or molecule aggregates that are 
5 responsible for catalyzing chemical and biological reactions. Such molecules are 
typically proteins, but can also comprise short peptides, RNAs, ribozymes, 
antibodies, and other molecules. A molecule that catalyzes chemical and 
biological reactions is referred to as "having enzyme activity" or "having 
catalytic activity," 
10 All amino acid residues identified herein are in the natural 

L-configuration. In keeping with standard polypeptide nomenclature (see L 
BioL Chem., 243 , 3557 (1969)), abbreviations for amino acid residues are as 
shown in the following Table of Correspondence. 

1 5 TABLE OF CORRESPONDENCE 



20 



25 



30 



l-Letter 


3-Letter 


AMINO ACID 


Y 


Tyr 


L-tyrosine 


G 


Gly 


glycine 


F 


Phe 


L-phenylalanine 


M 


Met 


L-methiomne 


A 


Ala 


L-alanine 


S 


Ser 


L-serine 


I 


He 


L,-isoleucine 


L 


Leu 


L-leuciae 


T 


Thr 


L-threonine 


V 


Val 


L-valine 


P 


Pro 


L-proline 


K 


Lys 


L-lysine 


H 


His 


L-histidine 


Q 


Gin 


L-glutamine 


E 


Glu 


L-glutamic acid 


W 


Trp 


L-tryptophan 
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R Arg L-arginine 

D Asp L-aspartic acid 

_ . _N- _-- -Asn- - -L=-asparagine - - - 

C Cys L-cysteine 

5 

The term "sequence homology" means the proportion of base matches 
between two nucleic acid sequences or the proportion of amino acid matches 
between two amino acid sequences. When sequence homology is expressed as a 
percentage, e.g., 50%, the percentage denotes the proportion of matches over the 

10 length of sequence from one sequence that is compared to some other sequence. 
Gaps (in either of the two sequences) are permitted to maximize matching; gap 
lengths of 15 bases or less are usually used, 6 bases or less are preferred with 
2 bases or less more preferred. When using oligonucleotides as probes or 
treatments, the sequence homology between the target nucleic acid and the 

15 oligonucleotide sequence is generally not less than 17 target base matches out of 
20 possible oligonucleotide base pair matches (85%); preferably not less than 9 
matches out of 10 possible base pair matches (90%), and more preferably not less 
than 19 matches out of 20 possible base pair matches (95%). 

Two amino acid sequences are homologous if there is a partial or 

20 complete identity between their sequences. For example, 85% homology means 
that 85% of the amino acids are identical when the two sequences are aligned for 
maximum matching. Gaps (in either of the two sequences being matched) are 
allowed in maximizing matching; gap lengths of 5 or less are preferred with 2 or 
less being more preferred. Altematively and preferably, two protein sequences 

25 (or polyp eptide sequences derived from them of at least 100 amino acids in 
length) are homologous, as this term is used herein, if they have an alignment 
score of at more than 5 (in standard deviation imits) using the program ALIGN 
with the mutation data matrix and a gap penalty of 6 or greater. See Dayhoff, M. 
O., in Atlas of Protein Sequence and Structm-e, 1972, volume 5, National 

30 Biomedical Research Foundation, pp. 101-110, and Supplement 2 to this 
volume, pp. 1-10. The two sequences or parts thereof are more preferably 
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homologous if their amino acids are greater than or equal to 85% identical when 
optimally aUgned using the ALIGN program. 

The following terms are used to describe the sequen ce relations hips 

between two or more polynucleotides: "reference sequence", "comparison 
5 window", "sequence identity", "percentage of sequence identity", and 

"substantial identity". A "reference sequence" is a defined sequence used as a 
basis for a sequence comparison; a reference sequence may be a subset of a 
larger sequence, for example, as a segment of a full-length cDNA or gene 
sequence given in a sequence listing, or may comprise a complete cDNA or gene 

10 sequence. Generally, a reference sequence is at least 20 nucleotides in length, 
firequently at least 25 nucleotides in length, and often at least 50 nucleotides in 
length. Since two polynucleotides may each (1) comprise a sequence (i.e., a 
portion of the complete polynucleotide sequence) that is similar between the two 
polynucleotides, and (2) may further comprise a sequence that is divergent 

15 between the two polynucleotides, sequence comparisons between two (or more) 
polynucleotides are typically performed by comparing sequences of the two 
polynucleotides over a "comparison window" to identify and compare local 
regions of sequence similarity. 

A "comparison window", as used herein, refers to a conceptual segment 

20 of at least 20 contiguous nucleotides and wherein the portion of the 

polynucleotide sequence in the comparison window may comprise additions or 
deletions (i.e., gaps) of 20 percent or less as compared to the reference sequence 
(which does not comprise additions or deletions) for optimal alignment of the 
two sequences. 

25 Methods of alignment of sequences for comparison are well known in the 

art. Thus, the determination of percent identity between any two sequences can 
be accomphshed using a mathematical algorithm. Preferred, non-limiting 
examples of such mathematical algorithms are the algorithm of Myers and Miller 
(1988); the local homology algorithm of Smith and Waterman (1981); the 

30 homology ahgnment algorithm of Needleman and Wunsch (1970); the search- 
for-similarity-method of Pearson and Lipman (1988); the algorithm of Karlin 
and Altschul (1990), modified as in Karlui and Altschul (1993). 
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Computer implementations of these mathematical algorithms can be 
utilized for comparison of sequences to determine sequence identity. Such 
implementations include, but are not limited to: CLUSTAL in the-PG/Gene - 
program (available from Intelligenetics, Mountain View, California); the ALIGN 
5 program (Version 2^0) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in 
the Wisconsin Genetics Software Package, Version 8 (available from Genetics 
Computer Group (GCG), 575 Science Drive, Madison, Wisconsin, USA). 
Alignments using these programs can be performed using the default parameters. 
The CLUSTAL program is well described by Higgins et al (1988); Higgins et 

10 al. (1989); Corpet el al. (1988); Huang et al. (1992); and Pearson et al. (1994). 
The ALIGN program is based on the algorithm of Myers and Miller, supra. The 
BLAST programs of Altschul et al. (1990), are based on the algorithm of Karlin 
and Altschul supra. To obtain gapped alignments for comparison purposes. 
Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al. 

15 (1997). Alternatively, PSI-BLAST (in BLAST 2.0) can be used to perfomi an 
iterated search that detects distant relationships between molecules. See 
Altschul et aL, supra. When utilizing BLAST, Gapped BLAST, PSI-BLAST, 
the default parameters of the respective programs (e.g. BLASTN for nucleotide 
sequences, BLASTX for proteins) can be used. See 

20 http://www.ncbi.nlm,nih.gov. Alignment may also be performed manually by 
inspection. 

The term "sequence identity*' means that two polynucleotide sequences 
are identical (i.e., on a nucleotide-by-nucleotide basis) over the window of 
comparison. The term "percentage of sequence identity" means that two 

25 polynucleotide sequences are identical (i.e., on a nucleotide-by-nucleotide basis) 
for the stated proportion of nucleotides over the window of comparison. The 
term "percentage of sequence identity" is calculated by comparing two optimally 
aUgned sequences over the window of comparison, deterniining the number of 
positions at which the identical nucleic acid base (e.g.. A, T, C, G, U, or I) 

30 occurs in both sequences to yield the number of matched positions, dividing the 
number of matched positions by the total number of positions in the window of 
comparison (i.e., the window size), and multiplying the result by 100 to yield the 



wo 02/16944 



30 



PCT/USOl/26566 



percentage of sequence identity. The terms "substantial identity" as used herein 
denote a characteristic of a polynucleotide sequence, wherein the polynucleotide 

comprises a sequence that has at least 60%, preferably .at least 65%, more _ 

preferably at least 70%, up to about 85%, and even more preferably at least 90 to 
5 95%, more usually at least 99%, sequence identity as compared to a reference 
sequence over a comparison window of at least 20 nucleotide positions, 
frequently over a window of at least 20-50 nucleotides, and preferably at least 
300 nucleotides, wherein the percentage of sequence identity is calculated by 
comparing the reference sequence to the polynucleotide sequence which may 

10 • include deletions or additions which total 20 percent or less of the reference 
sequence over the window of comparison. The reference sequence may be a 
subset of a larger sequence. 

As applied to polypeptides, the term "substantial identity" means that two 
peptide sequences, when optimally aUgned, such.as by the programs GAP or 

15 BESTFn using default gap weights, share at least about 85% sequence identity, 
preferably at least about 90% sequence identity, more preferably at least about 
95 % sequence identity, and most preferably at least about 99 % sequence 
identity. 

20 The Synthetic Nucleic Acid Molecules and Methods of the Invention 

The invention provides compositions comprising synthetic nucleic acid 
molecules, as well as methods for preparing those molecules which yield 
synthetic nucleic acid molecules that are efficiently expressed as a polypeptide or 
protein with desirable characteristics including reduced inappropriate or 

25 imintended transcription characteristics when expressed in a particular cell type. 
Natural selection is the hypothesis that genotype-environment 
interactions occurring at the phenotypic level lead to differential reproductive 
success of individuals and hence to modification of the gene pool of a 
population. It is generally accepted that the amino acid sequence of a protein 

30 found in nature has vmdergone optimization by natural selection. However, 
amino acids exist within the sequence of a protein that do not contribute 
significantly to the activity of the protein and these amino acids can be changed 
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to other amino acids with Uttle or no consequence. Furthermore, a protein may 
be useful outside its natural environment or for purposes that differ from the 
conditions of its natural selection. In these circumstances, the amino acid 
sequence can be synthetically altered to better adapt the protein for its utility in 
5 various applications. 

Likewise, the nucleic acid sequence that encodes a protein is also 
optimized by natural selection. The relationship between coding DNA and its 
transcribed RNA is such that any change to the DNA affects the resulting RNA. 
Thus, natural selection works on both molecules simultaneously. However, this 

10 relationship does not exist between nucleic acids and proteins. Because multiple 
codons encode the same amino acid, many different nucleotide sequences can 
encode an identical protein. A specific protein composed of 500 amino acids can 
theoretically be encoded by more than 10^^^ different nucleic acid sequences. 

Natural selection acts on nucleic acids to achieve proper encoding of the 

15 corresponding protein. Presumably, other properties of nucleic acid molecules 
are also acted upon by natural selection. These properties include codon usage 
frequency, RNA secondary structure, the efficiency of intron splicing, and 
interactions with transcription factors or other nucleic acid binding proteins. 
These other properties may alter the efBciency of protein translation and the 

20 resulting phenotype. Because of the redundant nature of the genetic code, these 
other attributes can be optimized by natural selection without altering the 
corresponding amino acid sequence. 

Under some conditions, it is useful to synthetically alter the natural 
nucleotide sequence encoding a protein to better adapt the protein for alternative 

25 applications. A common example is to alter the codon usage frequency of a g;ene 
when it is expressed in a foreign host. Although redtmdancy in the genetic code 
allows amino acids to be encoded by multiple codons, different organisms favor 
some codons over others. The codon usage frequencies tend to differ most for 
organisms with widely separated evolutionary histories. It has been found that 

30 when transferring genes between evolutionarily distant organisms, the efficiency 
of protein translation can be substantially increased by adjusting the codon usage 
frequency (see U.S. Patent Nos. 5,096,825, 5,670,356 and 5,874,304), 
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Because of the need for evolutionary distance, the codon usage of 
reporter genes often does not correspond to the optimal codon usage of the 

experimental cells. -Examples include (3-galactosidase.(Prg-fl/)-aiid- - _ 

chloramphenicol acetyltransferase {cat) reporter genes that are derived from E, 
5 coli and are commonly used in mammalian cells; the p-glucuronidase (gus) 
reporter gene that is derived from E. coli and commonly used in plant cells; the 
firefly luciferase (Juc) reporter gene that is derived from an insect and commonly 
used in plant and mammalian cells; and the Renilla luciferase, and green 
fluorescent protein {gfp) reporter genes which are derived from coelenterates and 

10 are commonly used in plant and mammalian cells. To achieve sensitive 

quantitation of reporter gene expression, the activity of the gene product must 
not be endogenous to the experimental host cells. Thus, reporter genes are 
usually selected from organisms having unique and distinctive phenotypes. 
Consequently, these organisms often have widely separated evolutionary 

1 5 histories from the experimental host cells. 

Previously, to create genes having a more optimal codon usage frequency 
but still encoding the same gene product, a synthetic nucleic acid sequence was 
made by replacing existing codons with codons that were generally more 
favorable to the experimental host cell (see U.S. Patent Nos. 5,096;825, 

20 5,670,356 and 5,874,304.) The result was a net improvement in codon usage 
frequency of the synthetic gene. However, the optimization of other attributes 
was not considered and so these synthetic genes likely did not reflect genes 
optimized by natural selection. 

In particular, improvements in codon usage frequency are intended only 

25 for optimization of a RNA sequence based on its role in translation into a 

protein. Thus, previously described methods did not address how the sequence 
of a synthetic gene affects the role of DNA in transcription into RNA. Most 
notably, consideration had not been given as to how transcription factors may 
interact with the synthetic DNA and consequently modulate or otherwise 

30 influence gene transcription. For genes found in nature, the DNA would be 
optimally transcribed by the native host cell and would yield an RNA that 
encodes a properly folded gene product. In contrast, synthetic genes have 
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previously not been optimized for transcriptional characteristics. Rather, this 
property has been ignored or left to chance. 

~ This concern is important for all genes, but-particularly important for 
reporter genes, which are most commonly used to quantitate transcriptional 
behavior in the experimental host cells. Hxmdreds of transcription factors have 
been identified in different cell types under different physiological conditions, 
and likely more exist but have not yet been identified. All of these transcription 
factors can influence the transcription of an introduced gene. A useful synthetic 
reporter gene of the invention has a minimal risk of influencing or perturbing 
intrinsic transcriptional characteristics of the host cell because the structure of 
that gene has been altered. A particularly useful synthetic reporter gene will 
have desirable characteristics under a new set and/or a wide variety of 
experimental conditions. To best achieve these characteristics, the structure of 
the synthetic gene should have minimal potential for interacting with 
transcription factors within abroad range of host cells and physiological 
conditions. Minimizing potential interactions between a reporter gene and a host 
cell's endogenous transcription factors increases the value of a reporter gene by 
reducing the risk of inappropriate transcriptional characteristics of the gene 
within a particular experiment, increasing applicability of the gene in various 
environments, and increasing the acceptance of the residting experimental data. 

In contrast, a reporter gene comprising a native nucleotide sequence, 
based on a genomic or cDNA clone from the original host organism, may 
interact with transcription factors when expressed in an exogenous host. This 
risk stems firom two circxmistances. First, the native nucleotide sequ^ce 
contains sequences that were optimized through natural selection to influence 
gene transcription within the native host organism. However, these sequences 
might also influence transcription when the gene is expressed in exogenous 
hosts, i.e., out of context, thus interfering with its performance as a reporter gene. 
Second, the nucleotide sequence may itiadvertently iateract with transcription 
factors that were not present in the native host organism, and thus did not 
participate in its natural selection. The probability of such inadvertent 
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interactions increases with greater evolutionary separation between the 
experimental cells and the native organism of the reporter gene. 

- - These potential interactions with transcription factors would likely be_.. 

disrupted when using a synthetic reporter gene having alterations in codon usage 

5 frequency. However, a syntiietic reporter gene sequence, designed by choosing 
codons based only on codon usage frequency, is likely to contain other 
unintended transcription factor binding sites since the synthetic gene has not 
been subjected to the benefit of natural selection to correct inappropriate 
transcriptional activities. Inadvertent interactions with transcription factors 
10 could also occur whenever the encoded amino acid sequence is artificially 
altered, e.g., to introduce amino acid substitutions. Similarly, these changes 
have not been subjected to natural selection, and thus may exhibit undesired 
characteristics. 

Thus, the invention provides a method for preparing synthetic nucleic 

15 acid sequences that reduce the risk of undesirable interactions of the nucleic acid 
with transcription factors when expressed in a particular host cell, tiiereby 
reducing inappropriate or unintended transcriptional characteristics. Preferably, 
the method yields synthetic genes containing improved codon usage frequencies 
for a particular host cell and with a reduced occurrence of transcription factor 

20 binding sites. The invention also provides a method of preparing synthetic 

genes containing improved codon usage frequencies with a reduced occurrence 
of transcription factor binding sites and additional beneficial stmctural attributes. 
Such additional attributes include the absence of inappropriate KNA splicing 
jimctions, poly(A) addition signals, undesirable restriction sites, ribosomal 

25 binding sites, and secondary structural motifs such as hairpin loops. 

Also provided is a method.for preparing two synthetic genes encoding 
the same or highly similar proteins ("codon distinct" versions). Preferably, the 
two synthetic genes have a reduced ability to hybridize to a common 
polynucleotide probe sequence, or have a reduced risk of recombining when 

30 present together in liviag cells. To detect recombination, PGR amphfication of 
the reporter sequences using primers complementary to flanking sequences and 
sequencing of the amplified sequences may be employed. 
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To select codons for the synthetic nucleic acid molecules of the 
invention, preferred codons have a relatively high codon usage frequency in a 
- selected-host ceU, and their introduction results in -the introduction of relatively _ 
few transcription factor binding sites, relatively few other undesirable structural 
5 attributes, and optionally a characteristic that distinguishes the synthetic gene 
from another gene encoding a highly similar protein. Thus, the synthetic nucleic 
. acid product obtained by the method of the invention is a synthetic gene with 
improved level of expression due to improved codon usage frequency, a reduced 
risk of inappropriate transcriptional behavior due to a reduced number of 
10 undesirable transcription regulatory sequences, and optionally any additional 
characteristic due to other criteria that may be employed to select the synthetic 
sequence. 

The invention may be employed with any nucleic acid sequence, e.g., a 
native sequence such as a cDNA or one which has been manipulated in vitro^ 

15 e.g., to introduce specific alterations such as the introduction or removal of a 

restriction en2yme recognition site, the alteration of a codon to encode a different 
amino acid or to encode a fusion protein, or to alter GC or AT content (% of 
composition) of nucleic acid molecules. Moreover, the method of the invention 
is useful with any gene, but particularly useful for reporter genes as well as other 

20 genes associated with the expression of reporter genes, such as selectable 
markers. Preferred genes include, but are not limited to, those encoding 
lactamase (P-gal), neomycin resistance (Neo), CAT, GUS, galactopyranoside, 
GFP, xylosidase, thymidine kinase, arabinosidase and the like. As used herein, a 
"marker gene" or "reporter gene" is a gene that imparts a distinct phenotype to 

25 cells expressing the gene and thus permits cells having the gene to be 

distinguished from cells that do not have tiie gene. Such genes may encode 
either a selectable or screenable marker, depending on whether the marker 
confers a trait which one can 'select' for by chemical means, i.e., through the use 
of a selective agent (e,g., a herbicide, antibiotic, or the like), or whether it is 

30 simply a "reporter" trait that one can identify through observation or testing, i.e., 
by 'screening'. Elements of the present disclosure are exemplified in detail 
through the use of particular marker genes. Of course, many examples of 
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suitable marker genes or reporter genes are known to the art and can be 
employed in the practice of the invention. Therefore, it will be understood that 
' -- - the foUowmgdiscussionis exemplary- rather. than.exhaustive. 

techniques disclosed herein and the general recombinant techniques which are 
5 known in the art, the present invention renders possible the alteration of any 
gene. 

Exemplary marker genes include, but are not limited to, a neo gene, a p- 
gal gene, a gus gene, a cat gene, a gpt gene, a hyg gene, a hisD gene, a ble gene, 
a mprt gene, a bar gene, a nitrilase gene, a mutant acetolactate synthase gene 

10 (ALS) or acetoacid synthase gene (AAS), a methotrexate-resistant dhfi' gene, a 
dalapon dehalogenase gene, a mutated anthranilate synthase gene that confers 
resistance to 5-methyl tryptophan (WO 97/26366), an R-locus gene, a p- 
lactamase gene, axy/E gene, an a-amylase gene, a tyrosinase gene, a luciferase 
(luc) gene, (e.g., a Renilla renifomiis luciferase gene, a firefly luciferase gene, or 

15 a click beetle luciferase {Pyrophorus plagiophthalamus) gene), an aequorin gene, 
or a green fluorescent protein gene. Included within the terms selectable or 
screenable marker genes are also genes which encode a "secretable marker" 
whose secretion can be detected as a means of identifying or selecting for 
transformed cells. Examples include markers which encode a secretable antigen 

20 that can be identified by antibody interaction, or even secretable enzymes which 
can be detected by their catalytic activity. Secretable proteins fall into a number 
of classes, including small, diffusible proteins detectable, e.g., by ELISA, and 
proteins that are inserted or trapped in the cell membrane. 

The method of the invention can be performed by, although it is not 

25 limited to, a recursive process. The process includes assigning preferred codons 
to each amino acid in a target molecule, e.g., a native nucleotide sequence, based 
on codon usage in a particular species, identifying potential transcription 
regulatory sequences such as transcription factor binding sites in the nucleic acid 
sequence having preferred codons, e.g., using a database of such binding sites, 

30 optionally identifying other undesirable sequences, and substituting an 
alternative codon (i.e., encoding the same amino acid) at positions where 
undesirable transcription factor binding sites or other sequences occur. For 
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codon distinct versions, alternative preferred codons are substituted in each 
version. If necessary, the identification and elimination of potential transcription 
factor or other undesirable sequences can be repeated until a nucleotide sequence- - 
is achieved containing a maximum nxmiber of preferred codons and a minimum 
5 number of undesired sequences including transcription regulatory sequences or 
other undesirable sequences. Also, optionally, desired sequences, e.g., 
restriction enzyme recognition sites, can be introduced. After a synthetic nucleic 
acid molecule is designed and constructed, its properties relative to the parent 
nucleic acid sequence can be determined by methods well known to the art. For 

10 example, the expression of the synthetic and target nucleic acid molecules in a 
series of vectors in a particxilar cell can be compared. 

Thus, generally, the method of the invention comprises identifying a 
target nucleic acid sequence, such as a vector backbone, a reporter gene or a 
selectable marker gene, and a host cell of interest, for example, a plant (dicot or 

15 monocot), fungus, yeast or mammaUan cell. Preferred host cells are mammalian 
host cells such as CHO, COS, 293, Hela, CV-1 and NIH3T3 cells. Based on 
preferred codon usage in the host cell(s) and, optionally, low codon usage in the 
host cell(s), e.g., high usage mammalian codons and low usage E, coli and 
mammalian codons, codons to be replaced are determined. For codon distinct 

20 versions of two synthetic nucleic acid rholecules, alternative preferred codons are 
introduced to each version. Thus, for amino acids having more than two codons, 
one preferred codon is introduced to one version and another preferred codon is 
introduced to the other version. For amino acids having six codons, the two 
codons with the largest number of mismatched bases are identified and one is 

25 introduced to one version and the other codon is introduced to the other version.- 
Concurrent, subsequent or prior to selecting codons to be replaced, desired and 
imdesired sequences, such as undesired transcriptional regulatory sequences, in 
the target sequence are identified. These sequences can be identified using 
databases and software such as EPD, NNPD, REBASE, TRANSFAC, TESS, 

30 GenePro, MAR rwww.ncgr.org/MAR-search) and BCM Gene Finder, further 
described herein. After the sequences are identified, the modification(s) are 
introduced. Once a desired synthetic nucleic acid sequence is obtained, it can be 
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prepared by methods well known to the art (such as PGR with overlapping 
primers), and its structural and functional properties compared to the target 

nucl eic acid sequence, including^ but not limited to, percent homology, presence 

or absence of certain sequences, for example, restriction sites, percent of codons 
5 changed (such as an increased or decreased usage of certain codons) and 
expression rates. 

As described below, the method was used to create synthetic reporter 
genes encoding Renilla renifonitis luciferase, and two click beetle luciferases 
(one emitting green light and the other emitting red light). For both systems, the 

10 synthetic genes support much greater levels of expression than the corresponding 
native or parent genes for the protein. In addition, the native and parent genes 
demonstrated anomalous transcription characteristics when expressed in 
mammalian cells, which were not evident in the synthetic genes. In particular, 
basal expression of the native or parent genes is relatively high. Furthermore, 

1 5 the expression is induced to very high levels by an enhancer sequence in the 

absence of known promoters. The synthetic genes show lower basal expression 
and do not show the anomalous enhancer behavior. Presumably, the enhancer is 
activating transcriptional elements found in the native genes that are absent in 
the synthetic genes. The results clearly show that the synthetic nucleic acid 

20 sequences exhibit superior perfomiance as reporter genes. 

Exemplary Uses of the Molecules of the Invention 

The synthetic genes of the invention preferably encode the same proteins 
as their native counterpart (or nearly so), but have improved codon usage while 

25 being largely devoid of known transcription regulatory elements in the coding 
region. (It is recognized that a small number of amino acid changes may be 
desired to enhance a property of the native counterpart protein, e.g. to enhance 
luminescence of a luciferase.) This increases the level of expression of the 
protein the synthetic gene encodes and reduces the risk of anomalous expression 

30 of the protein. For example, studies of many important events of gene 
regulation, which may be mediated by weak promoters, are limited by 
insufficient reporter signals from inadequate expression of tiie reporter proteins. 
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The synthetic luciferase genes described herein pennit detection of weak 
promoter activity because of the large increase in level of expression, which 
_enables.increased detection.sensitivity.- Also, the lise of some selectable ma^^ 
may be limited by the expression of that marker in an exogenous cell. Thus, 
5 synthetic selectable marker genes which have improved codon usage for that 
cell, and have a decrease in other imdesirable sequences, (e.g., transcription 
factor binding sites), can permit the use of those markers in cells that otherwise 
were imdesirable as hosts for those markers. 

Promoter crosstalk is another concern when a co-reporter gene is used to 

1 0 normalize transfection efficiencies. With the enhanced expression of synthetic 
genes, the amount of DNA containing strong promoters can be reduced, or DNA 
containing weaker promoters can be employed, to drive the expression of the co- 
reporter. In addition, there may be a reduction in the background expression 
from the synthetic reporter genes of the invention. This characteristic makes 

15 synthetic reporter genes more desirable by minimizing the sporadic expression 
from the genes and reducing the interference resulting from other regulatory 
pathways. 

The use of reporter genes in imaging systems, which can be used for in 
vivo biological studies or drug screening, is another use for the synthetic genes of 

20 the invention. Due to their increased level of expression, the protein encoded by 
a synthetic gene is more readily detectable by an imaging system. In fact, using 
a synthetic Renilla luciferase gene, luminescence in transfected CHO cells was 
detected visually without the aid of instrumentation. 

In addition, the synthetic genes may be used to express fusion proteins, 

25 for example fusions with secretion leader sequences or cellular localization 

sequences, to study transcription in difficult-to-transfect cells such as primary 
cells, and/or to improve the analysis of regulatory pathways and genetic 
elements. Other uses include, but are not limited to, the detection of rare events 
that require extreme sensitivity (e.g., studying RNA receding), use with IRES, to 

30 improve the efficiency of m vitro translation or in viti'o transcription-translation 
c'oupled systems such as TNT (Promega Corp., Madison, WI), study of reporters 
optimized to different host organisms (e.g., plants, fungus, and the like), use of 
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multiple genes as co-reporters to monitor drug toxicity, as reporter molecules in 
multiwell assays, and as reporter molecules in drug screening with the advantage 

of minimizing possible mterf^encej)f reporter^gM^ by^different ji^^ 

transduction pathways and other regulatory mechanisms. 
5 Additionally, uses for the nucleic acid molecules of the invention include 

fluorescence activated cell sorting (FACS), fluorescent microscopy, to detect 
and/or measure the level of gene expression in vitro and in vzvo, (e.g., to 
determine promoter strength), subcellular locaUzation or targeting (fusion 
protein), as a marker, in calibration, in a kit, (e.g., for dual assays), for in vivo 
10 imaging, to analyze regulatory pathways and genetic elements, and in multi-well 
fomiats. 

With respect to synthetic DNA encoding luciferases, the use of synthetic 
click beetle luciferases provides advantages such as the measurement of dual 
reporters. As Renilla luciferase is better suited for in vivo imaging (because it 
1 5 does not depend on ATP or Mg^"*" for reaction, unlike firefly luciferase, and 

because coelenterazine is more permeable to the cell membrane than luciferin), 
the synthetic Renilla luciferase gene can be employed in vivo. Further, the 
synthetic Renilla luciferase has improved fidelity and sensitivity in dual 
luciferase assays, e.g., for biological analysis or in dmg screening platform. 

20 

Demonstration of the Invention Using Luciferase Genes 

The reporter genes for click beetle luciferase and Renilla luciferase were 
used to demonstrate tihe invention because the reaction catalyzed by the protein 
they encode are significantly easier to quantify than the product of most genes. 
25 However, for the purposes of demonstrating the present invention they represent 
genes in general. 

Although the cUck beetle luciferase and Renilla luciferase genes share the 
name "luciferase", this should not be interpreted to mean that they originate from 
the same family of genes. The two luciferase proteins are evolutionarily 
30 distinct; they have fundamentally different traits and physical structures, they use 
vastly different substrates (Figure 17), and they evolved firom completely 
different fanodlies of genes. The cUck beetle luciferase is 61 kD in size, uses 
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luciferin as a substrate and evolved from the CoA synthetases. The Renilla 
luciferase originates from the sea pansy Renilla Reniformis^ is 35 kD in size, 
uses coelenterazine as a substrate and eyolved from the gcp hydrolases. The only 
shared trait of these two enzymes is that the reaction they catalyze results in light 
5 output. They are no more similar for resulting in light output than any other two 
enzymes would be, for example, simply because the reaction they catalyze 
results in heat. 

Bioluminescence is the light produced in certain organisms as a result of 
luciferase-mediated oxidation reactions. The luciferase genes, e.g., the genes 

10 from luminous beetles, sea pansy, and, in particular, the luciferase from Photinus 
pyralis (the conmion firefly of North America), are currently the most popular 
luminescent reporter genes. Reference is made to Bronstein et al. (1994) for a 
review of luminescent reporter gene assays and to Wood (1995) for a review of 
the evolution of beetle bioluminescence. See Figure 17 for an illustration of the 

15 reactions catalyzed by each of firefly and click beetle luciferases (17A) and 
Renilla luciferase (17B). 

Firefly luciferase and Renilla luciferase are highly valuable as genetic 
reporters due to the convenience, sensitivity and linear range of the luminescence 
assay. Today, luciferase is used in virtually every type of experimental 

20 biological system, including, but not limited to, prokaryotic and eukaryotic cell 
culture, transgenic plants and animals, and cell-free expression systems. The 
firefly luciferase enzyme is derived from a specific North American beetle, 
Photinus pyralis. The firefly luciferase enzyme and the click beetle luciferase 
enzyme are monomeric proteins (61 kDa) which generate light through 

25 monooxygenation of beetle luciferin utilizing ATP and O2 (Figure 17A). The 
Renilla luciferase is derived from the sea pansy Renilla reniformis, ThQ Renilla 
luciferase enzyme is a 36 kDa monomeric protein that utilizes O2 and 
coelenterazine to generate light (Figure 17B). 

The gene encoding firefly luciferase was cloned fi'om Photinus pyralis, 

30 and demonstrated to produce active enzyme in -E. coli (de Wet et al,, 1987). The 
cDNA encoding firefly luciferase Que) continues to gain favor as the gene of 
choice for reporting genetic activity in animal, plant and microbial cells. The 
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firefly luciferase reaction, modified by the addition of CoA to produce persistent 
light emission, provides an extremely sensitive and rapid in viu^o assay for 
quantifying firefly luciferase expression in snuQl samples of transfected cells or: 
tissues. 

5 To use firefly luciferase or click beetle luciferase as a genetic reporter, 

extracts of cells expressing the luciferase are mixed with substrates (beetle 
luciferin, Mg^"*" ATP, and O2), and luminescence is measured immediately. The 
assay is very rapid and sensitive, providing gene expression data with little 
effort. The conventional firefly luciferase assay has been further improved by 

10 including coenzyme A in the assay reagent to yield greater enzyme turnover and 
thus greater luminescence intensity (Promega Luciferase Assay Reagent, Cat.# 
E1500, Promega Corporation, Madison, Wis.). Using this reagent, luciferase 
activity can be readily measured in luminometers or scintillation counters. 
Firefly and click beetle luciferase activity can also be detected in living cells in 

15 culture by adding luciferin to the growth medium. This in situ luminescence 
relies on the ability of beetie luciferin to diffuse through cellular and 
peroxisomal membranes and on the intracellular availabihty of ATP and O2 in 
the C3^osol and peroxisome. 

Further, although reporter genes are widely used to measure transcription 

20 events, their utility can be limited by the fidelity and efficiency of reporter 

expression. For example, in U.S. Patent No. 5,670,356, a firefly luciferase gene 
(referred to as luc+) was modified to improve the level of luciferase expression. 
While a higher level of expression was observed, it was not determined that 
higher expression had improved regulatory control. 

25 The invention will be fiuther described by the following nonlimiting 

examples. 

Example 1 

Synthetic Click Beetle CKD and GR) Luciferase Nucleic Acid Molecules 
30 LucPp/Y G is a wild-type click beetle luciferase that emits yellow-green 

luminescence (Wood, 1989). A mutant of LucP/?lYG named YG#81-.6G01 was 
envisioned. YG#81r6G01 lacks a peroxisome targeting signal, has a lower Km 
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for luciferin and ATP, has increased signal stability and increased temperature 
stability when compared to the wild type (PCT/W09914336). YG #81-6G01 
_ was mutated_to. emit green luminescence hy changing Ala at position 224 to Val - - - 
(A224V is a green-shiftiag mutation), or to emit red luminescence by 
5 simultaneously introducing the amino acid substitutions A224H, S247H, N346I, 
and H348Q (red-shifting mutation set) (PCTAV095 18853) 

Using YG #81-6G01 as a parent gene, two synthetic gene sequences were 
designed. One codes for a luciferase emitting green limiinescence (GR) and one 
for a luciferase emitting red luminescence (RD). Both genes were designed to 1) 

1 0 have optimized codon usage for expression in mammalian cells, 2) have a 
reduced number of transcriptional regulatory sites including mammalian 
transcription factor binding sites, splice sites, poly(A) addition sites and 
promoters, as well as prokaryotic (E, coli) regulatory sites, 3) be devoid of 
xmwanted restriction sites, e.g., those which are likely to interfere with standard 

15 cloning procedures, and 4) have a low DNA sequence identity compared to each 
other in order to minimize genetic rearrangements when both are present inside 
the same cell. In addition, desired sequences, e.g., aKozak sequence or 
restriction enzyme recognition sites, may be identified and introduced. 

Not all design criteria could be met equally well at the same time. The 

20 following priority was estabhshed for reduction of transcriptional regulatory 
sites: elimination of transcription factor (TF) binding sites received the highest 
priority, followed by elimination of splice sites and poly(A) addition sites, and 
finally prokaryotic regulatory sites. When removing regulatory sites, the strategy 
was to work from the lesser important to the most important to ensure that the 

25 most important changes were made last. Then the sequence was rechecked for 
the appearance of new lower priority sites and additional changes made as 
needed. Thus, the process for designing the synthetic GR and RD gene 
sequences, using computer programs described herein, involved 5 optionally 
iterative steps that are detailed below 

30 1 . Optimized codon usage and changed A224 V to create GRverL 

separately changed A224H, S247H, H348Q and N346I to create 
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RDverl . These particular amino acid changes were maintained 
throughout all subsequent manipulations to the sequence. 
Removed imdesired restriction sites, prokaryotic regulatory sites, 
splice 

sites, poly(A) sites thereby creating GRver2 and RDver2 . 
Removed transcription factor binding sites (first pass) and removed 
any 

newly created undesired sites as listed in step 2 above thereby 
creating 

GRver3 and RDverB . 

Removed transcription factor binding sites created by step 3 above 

(second pass) and removed any newly created undesired sites as listed 

in step 2 above thereby creating GRver4 and RDver4 . 

Removed transcription factor binding sites created by step 4 above 

(third 

Pass) and confirmed absence of sites listed in st&p 2 above liiereby 
creating GRverS and RDverS . 

Constructed the actual genes by PGR using synthetic oligonucleotides 
corresponding to fi-agments of GRverS and RDverS designed 
sequences (Figures 6 and 10) thereby creating GR6 and RD7 . GR6, 
upon sequencing was found to have. the serine residue at amino acid 
position 49 mutated to an asparagine and the proline at amino acid 
position 230 mutated to a serine (S49N, P230S). RD7, upon 
sequencing was foimd to have the histidine at amino acid position 36 
mutated to a tyrosine (H36Y). These changes occurred during the 
PGR process. 

The mutations described in step 6 above (S49N, P230S for GR6 and 
H36Y for RD7) were reversed to create GRverS. 1 and RDverS. L 
RDverS. 1 was further modified by changing the arginine codon at 
position 3S 1 to a glycioe codon (R351G) thereby creating RDverS. 2 
with improved spectral properties compared to RDverS. 1. 
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9. RDver5.2 was further mutated to increase luminescence intensity 
thereby creating RD156-1H9 which encodes four additional amino 

_ - acidchanges (M2I,-S349T, K488T3538V) and three sik^^ 

base changes (SEQ ID NO: 18). 

5 

1 ■ Optimize codon usage and introduce mutations determining luminescence 
color 

The starting gene sequence for this design step was YG #81-6G01 (SEQ ID 
NO:2). 

10 a) Optimize codon usage: 

The strategy was to adapt the codon usage for optimal expression in 
human cells and at the same time to avoid E. coli low-usage codons. Based on 
these requirements, the best two codons for expression in human cells for all 
amino acids with more than two codons were selected (see Wada et al., 1990). 

15 In the selection of codon pairs for amino acids with six codons, the selection was 
biased towards pairs that have the largest number of mismatched bases to allow 
design of OR and RD genes with rninimum sequence identity (codon 
distinction): 

Arg: CGC/CGT Leu: CTG/TTG Ser: TCT/AGC 
20 Thr: ACC/ACT Pro: CCA/CCT Ala: GCC/GCT 

Gly: GGC/GGT Val: GTC/GTG He: ATC/ATT 
Based on this selection of codons, two gene sequences encoding the YG#81- 
6G01 luciferase proteia sequence were computer generated. The two genes were 
designed to have minimum DNA sequence identity and at the same time closely 
25 similar codon usage. To achieve this, each codon in the two genes was replaced 
by a codon from the Uroited list described above in an altemating fashion (e.g., 
Arg(n) is CGC in gene 1 and CGT in gene 2, Arg(n+i) is CGT in gene 1 and CGC 
in gene 2). • 

For subsequent steps in the design process it was anticipated that changes 
30 had to be made to this limited optimal codon selection in order to meet other 
design criteria, however, the following low-usage codons in mammalian cells 
were not used unless needed to meet criteria of higher priority: 
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Arg: CGA Leu: CTA Ser: TCG 

Pro: CCG Val: OTA He: ATA 
Also, the following low-u sage codons in E. coli were avoid ed w hen reasonable 
(note that 3 of these match the low-usage Ust for mammahan cells): 

Arg: CGA/CGG/AGA/AGG 

Leu: CTA Pro: CCG He: ATA 



b) Introduce mutations determining luminescence color: 

Into one of the two codon-optimized gene sequences was introduced the 
10 single green-shifting mutation and into the other were introduced the 4 red- 
shifling mutations as described above. 

The two output sequences from this first design step were named GRverl 
(version 1 GR) and RDverl (version 1 RD). Their DNA sequences are 63% 
identical (594 mismatches), while the proteins they encode differ only by the 4 
15 amino acids that determine luminescence color (see Figures 2 and 3 for an 
alignment of the DNA and protein sequences). 

Tables 1 and 2 show, as an example, the codon usage for valine aad 
leucine in human genes, the parent gene YG#81-6G01, the codon-optimized 
synthetic genes GRverl and RDverl, as well as the final versions of the 
20 synthetic genes after completion of step 5 in the design process (GRverS and 
RDverS). For a complete simnnary of the codon changes, see Figures 4 and 5. 
Table 1: Valine 



Codon 


Human 


Parent 


GR verl 


RDverl 


GTA 


4 


13 


0 


0 


GTC 


13 


4 


25 


24 


GTG 


24 


12 


25 


25 


GTT 


9 


20 


0 


0 



Table 2: Leucine 



Codon 


Human 


Parent 


GRverl 


RDverl 


CTA 


3 


5 


0 


0 


CTC 


12 


4 


0 


1 


CTG 


24 


4 


28 


27 



GR vers 


RD vers 


1 


1 


21 


26 


25 


17 


3 


5 




GR vers 


RD vers 


0 


0 


12 


11 


19 


18 
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CTT 


6 


12 


0 


0 


TTA 


3 


17 


0 


0 


TTG 


6 


13 


27 


27 



1 


1 


0 


0 


23 


25 



2. Remove undesired restriction sites, prokaryotic regulatory sites> splice sites 
and polv(A^ addition sites 

The starting gene sequences for this design step were GRverl and RDverl. 
5 a) Remove undesired restriction sites: 

To check for the presence and location of undesired restriction sites, the 
sequences of both synthetic genes were compared against a database of 
restriction enzyme recognition sequences (REBASE ver.712, 
http ://www.neb . coin/rebase) using standard sequence analysis software 
1 0 (GenePro ver 6.10, Riverside Scientific Ent.). 

Specifically, the following restriction enzymes were classified as undesired: 

- BamH I, Xlto I, Sfi I, Kpn I, Sac I, Mlu I, Nhe I, S?na I, Xho I, Bgl H, 
Hindm,Ncol,Narl,Xbal,Hpal,Sall, 

other cloning sites commonly used: EcoK I , EcoK V, Cla I, 
15 - eight-base cutters (commonly used for complex constructs), 

- BstE n (to allow N-terminal fusions), 

- Xcm I (can generate A/T overhang used for T-vector cloning). 

To eliminate xmdesired restriction sites when found in a synthetic gene, one or 
more codons of the synthetic gene sequence were altered in accordance with the 

20 codon optimization guidelines described in la above. 

b) Remove prokaryotic (JJ. coli) regulatory sequences: 

To check for the presence and location of prokaryotic regulatory . 
sequences, the sequences of both synthetic genes were searched for the presence 
of the following consensus sequences using standard sequence analysis software 

25 (GenePro): 

- TATAAT (-10 Pribnow box of promoter) 

- AGGA or GGAG (ribosome binding site; only considered if paired 
with a methionine codon 12 or fewer bases downstream). 
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To eliminate such regulatory sequences when found in a synthetic gene, one or 
more codons of the synthetic gene at sequence were altered in accordance with 

the codpn optinuzMipn guidelmes 

c) Remove splice sites: 
5 To check for the presence and location of splice sites, the DNA strand 

corresponding to the primary RNA transcript of each synthetic gene was 
searched for the presence of the following consensus sequences (see Watson et 
aL, 1983) using standard sequence analysis software (GenePro): 

- sphce donor site: AG I GTRAGT (exon I intron), the search was 
10 performed for AGGTRAG and the lower stringency GGTRAGT; 

- splice acceptor site: (Y)nNCAG I G (intron I exon), the search was 
performed with n == 1 . 

To eliminate splice sites found in a synthetic gene, one or more codons of the 
synthetic gene sequence were altered in accordance with the codon optimization 

15 guidelines described in la above. Splice acceptor sites were generally difficult 
to eliminate in one gene without introducing them into the other gene because 
they tended to contain one of the two only. Gin codons (CAG); they were 
removed by placing the Gin codon CAA in both genes at the expense of a 
slightly increased sequence identity between the two genes. 

20 d) Remove poly(A) addition sites: 

To check for the presence and location of poly(A) addition sites, the 
sequences of both synthetic genes were searched for the presence of the 
following consensus sequence using standard sequence analysis software 
(GenePro): 

25 - AATAAA. 

To eliminate each poly(A) addition site found in a synthetic gene, one or more 
codons of the synthetic gene sequence were altered in accordance with the codon 
optimization guidelines described in la above. The two output sequences from 
this second design step were named GRver2 and RDver2. Their DNA sequences 

30 are 63% identical (590 mismatches) (Figs. 2 and 3). 

3. Remove transcription factor (TF) binding sites, then repeat steps 2 a-d 
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The starting gene sequences for this design step were GRver2 and 
RJDver2. 

To check-for the presence,-lacation and identity_of potential TF binding sites, the _ 
sequences of both synthetic genes were used as query sequences to search a 
5 database of transcription factor binding sites (TRANSFAC v3,2). The 

TRANSFAC database fhttp ://transfac . gbf. de/TRANSFAC/indexrhtmH holds 
information on gene regulatory DNA sequences (TF binding sites) and proteins 
(TFs) that bind to and act through them. The SITE table of TRANSFAC Release 
3.2 contains 4,401 entries of individual (putative) TF binding sites (including TF 

10 binding sites in eukaryotic genes, in artificial sequences resulting from 
mutagenesis studies and in vitro selection procedures based on random 
oligonucleotide mixtures or specific theoretical considerations, and consensus 
binding sequences (from Faisst and Meyer, 1992)). 

The software tool used to locate and display these TF binding sites in the 

15 synthetic gene sequences was TESS (Transcription Element Search Software, 
http://agave.humgen.upenn.edu/tess/index.html) . The filtered string-based 
search option was used with the following user-defiaed search parameters: 

- Factor Selection Attribute: Organism Classification 

- Search Pattern: Mammalia 

20 - Max. Allowable Mismatch %: 0 

- Min. element length: 5 

- Min. log-likelihood: 10 

This parameter selection specifies that only mammalian TF binding sites 
(approximately 1,400 of the 4,401 entries in the database) that are at least 5 bases 

25 long will be included in the search. It ftirther specifies that only TF binding sites 
that have a perfect match in the query sequence and a minimum log Ukelihood 
(LLH) score of 10 will be reported. The LLH scoring method assigns 2 to an 
unambiguous match, 1 to a partially ambiguous match (e.g., A or T match W) 
. and 0 to a match against 'N'. For example, a search with parameters specified 

30 above would result in a "hif ' (positive result or match) for TATAA (SEQ ID 
NO:240) (LLH = 10), STRATG (SEQ ID NO:241) (LLH = 10), and 
MTTNCNNMA (SEQ ID NO:242) (LLH = 10) but not for TRATG (SEQ ID 
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NO: 243) (LLH = 9) if these four TF binding sites were present in the query 
sequence. A lower stringency test was performed at the end of the design 

- . process to re=evaluate-the search parameters. 

When TESS was tested with a mock query sequence containing known 
5 TF binding sites it was found that the program was unable to report matches to 
sites ending with the 3' end of the query sequence. Thus, an extra nucleotide 
was added to the 3 ' end of all query sequences to eliminate this problem. 

The first search for TF binding sites using the parameters described 
above found about 100 transcription factor binding sites (hits) for each of the 
10 two synthetic genes (GRver2 and RDver2). All sites were eliminated by 

changing one or more codons of the synthetic gene sequences in accordance with 
the codon optimization guidelines described in la above. However, it was 
expected that some these changes created new TF binding sites, other regulatory 
sites, and new restriction sites. Thus, steps 2 a-d were repeated as described, and 
15 4 new restriction sites and 2 new splice sites were removed. The two output 
sequences from this third design step were named GRverS and RDver3. Their 
DNA isequences are 66% identical (541 mismatches) (Figs. 2 and 3). 

4. Remove new transcription factor TTF) binding sites, then repeat steps 2 a-d 
20 The starting gene sequences for this design step were GRver3 and 

RDver3. 

This fourth step is an iteration of the process described in step 3. The search for 
newly introduced TF binding sites yielded about 50 hits for each of the two 
synthetic genes. All sites were eliminated by changing one or more codons of 

25 the synthetic gene sequences in general accordance with the codon optimization 
guidelines described in la above. However, more high to medium usage codons 
were used to allow elimination of all TF binding sites. The lowest priority was 
placed on maintaining low sequence identity between the GR and RD genes. 
Then steps 2 a-d were repeated as described. The two output sequences from 

30 this fourth design step were named GRver4 and RDver4. Their DNA sequences 
are 68% identical (506 mismatches) (Figs 2 and 3). 



wo 02/16944 



51 



PCT/USOl/26566 



5. Remove new transcription factor (TF^ binding sites, then rq peat steps 2 a-d 
The starting gene sequences for this design step were GRver4 and 

RDver4. ~ - - - - - - - - - 

This fifth step is another iteration of the process described in step 3 above. The 
5 search for new TF binding sites introduced in step 4 yielded about 20 hits for 
each of the two synthetic genes. All sites were eliminated by changing one or 
more codons of the synthetic gene sequences in general accordance with the 
codon optimization guidelines described in la above. However, more high to 
medium usage codons were used (these are all considered **prefened") to allow 

1 0, elimination of all TF binding sites. The lowest priority was placed on 

maintaining low sequence identity between the GR and RD genes. Then steps 2 
a-d were repeated as described. Only one acceptor splice site could not be 
eliminated. As a final step the absence of all TF binding sites in both genes as 
specified in step 3 was confirmed. The two output sequences from this fifth and 

15 last design step were named GRverS and RDverS. Their DNA sequences are 
69% identical (504 mismatches) (Figs. 2 and 3). 

Additional evaluation of GRverS and KDverS 

a) Use lower stringency parameters for TESS: 

20 The search for TF binding sites was repeated as described in step 3 above, but 
with even less stringent user-defined parameters: 

setting LLH to 9 instead of 1 0 did not result in new hits; 

setting LLH to 0 through 8 (incl.) resulted in hits for two additional 

sites, MAMAG (22 hits) and CTKTK (24 hits); 

25 - setting LLH to 8 and the minimiun element length to 4, the search 

jdelded (in addition to the two sites above) different 4-base sites for 
AP-1, NF-1, and c-Myb that are shortened versions of their longer 
respective consensus sites which were eliminated in steps 3-5 above. 
It was not realistic to attempt complete elimination of these sites without 

30 introduction of new sites, so no further changes were made. 

b) Search different database: 



wo 02/16944 



52 



PCT/USOl/26566 



The Eukaryotic Promoter Database (release 45) contains information about 
reliably mapped transcription start sites (1253 sequences) of eukaryotic genes. 
- -This database was searched using BL AS-TN_ 1.4. 1 1 with default parameters 
(optimized to jSnd nearly identical sequences rapidly; see Altschul et al, 1990) at 
5 the National Center for Biotechnology Information site 

(http ://www.ncbi.nlm.nih. gov/cgi-bin/BLAST) . To test this approach, a portion 
of pGL3 -Control vector sequence containing the SV40 promoter and enhancer 
was used as a query sequence, yielding the expected hits to SV40 sequences. No 
hits were found when using the two synthetic genes as query sequences. 

10 

Summary of GRver5 and RDverS synthetic gene properties 

Both genes, which at this stage were still only "virtual" sequences in the 

computer, have a codon usage that strongly favors mammalian high-usage 

codons and minimizes mammalian and E. coli low-usage codons. Figure 4 
1 5 shows a summary of the codon usage of the parent gene and the various 

synthetic gene versions. 

Both genes are also completely devoid of eukaryotic TF binding sites 

consisting of more than four unambiguous bases, donor and acceptor splice sites 

(one exception: GRverS contains one sphce acceptor site), poly(A) addition sites, 
20 specific prokaryotic {E. coli) regulatory sequences, and undesired restriction 

sites. 

The gene sequence identity between GRver5 and RDver5 is only 69% 
(504 base mismatches) while their encoded proteins are 99% identical (4 amino 
acid mismatches), see Figures 2 and 3. Their identity with the parent sequence 
25 YG#81-6G1 is 74% (GRverS) and 73% (RDverS), see Figure 2. Their base 
composition is 49.9% GC (GRverS) and 49.5% GC (RDverS), compared to 
40,2% GC for the parent YG#81^6G01. 

Construction of synthetic genes 
30 The two synthetic genes were constructed by assembly from synthetic 

oUgonucleotides in a theraiocycler followed by PGR amplification of the fiill- 
length genes (similar to Stemmer et al (1995) Gene. 164, pp. 49-53). 
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Unintended mutations that interfered with the design goals of the synthetic genes 
were corrected. 

a) Design of synthetic oligonucleotides: 

5 The synthetic ohgonucleotides were mostly 40mers that collectively code 

for both complete strands of each designed gene (1,626 bp) plus flankiBg regions 
needed for cloning (1,950 bp total for each gene; Figure 6). The 5' and 3* 
boundaries of all ohgonucleotides specifying one strand were generally placed in 
a manner to give an average offset/overlap of 20 bases relative to the boundaries 

10 of the oligonucleotides specifying the opposite strand. 

The ends of the flaixking regions of both genes matched tlie ends of the 
amplification primers (pRAMtailup: 5'-gtactgaga cgacgccagcccaagcttaggcctgagtg 
SEQ ID NO:229, and pRAMtaildn: S'-ggcatgagcgt paactgactgaactagcggccgccgag 
SEQ ID NO:230) to allow cloning of the genes into our ^. coli expression vector 

15 pRAM (W099/14336). 

A total of 183 ohgonucleotides were designed (Figure 6): fifteen 
ohgonucleotides that collectively encode the upstream and downstream flanking 
sequences (identical for both genes; SEQ ID NOs: 35-49) and 168 
oligonucleotides (4 x 42) that encode both strands of the two genes (SEQ ID 

20 NOs: 50-217). 

All 183 oligonucleotides were run through the hairpin analysis of the 
OLIGO software (OLIGO 4.0 Primer Analysis Software © 1989-1991 by 
Wojciech Rychlik) to identify potentially detrimental intra-molecular loop 
formation. The guidelines for evaluating the analysis results were set according 

25 to recommendations of Dr. Sims (Sigma-Genosys Custom Gene Synthesis 

Department): oligos forming hairpins with AG < -10 have to be avoided, those 
forming hairpins with AG ^ -7 involving the 3' end of the oligonucleotide should 
also be avoided, while those with an overall AG ^ -5 should not pose a problem 
for this apphcation. The analysis identified 23 ohgonucleotides able to form 

30 hairpins with a AG between -7.1 and -4.9. Of these, 5 had blocked or nearly 

blocked 3' ends (0-3 firee bases) and were re-designed by removing 1-4 bases at 
their 3* end and adding it to the adjacent oligonucleotide. 
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The 40iner oligonucleotide covering the sequence complementary to the 
poly(A) tail had a very low complexity 3' end (13 consecutive T bases). An 
" ~ " ~ additional-40mer-was designed with-a high complexity 3 ' end but a consequently,^ 
reduced overlap with one of its complementary oligonucleotides (11 instead of 
5 20 bases) on the opposite strand. 

Even though the oligos were designed for use in a thermocycler-based 
assembly reaction, they could also be used in a ligation-based protocol for gene 
construction. In this approach, the ohgonucleotides are annealed in a pairwise 
fashion and the resulting short double-stranded fragments are Hgated using the 
10 sticky overhangs. However, this would require that all oligonucleotides be 
phosphorylated. 

b) Gene assembly and amplification 

In a first step, each of the two sjaithetic genes was assembled in a 
15 separate reaction from 98 oligonucleotides. The total volume for each reaction 
was 50 ^1: 

0.5 |aM oligonucleotides (= 0.25 pmoles of each oligo) 
LOU Taq DNA polymerase 
0.02 U Pfu DNA polymerase 
20 2mMMgCl2 

0.2 mM dNTPs (each) 
0.1% gelatin 

Cycling conditions: (94''C for 30 seconds, 52°C for 30 
. seconds, and 72°C for 30 seconds) x 55 cycles. 
25 In a second step, each assembled synthetic gene was amplified in a 

separate reaction. The total volume for each reaction was 50 \Ji\: 

2.5 1 assembly reaction 

5.0 U Taq DNA polymerase 

0. 1 U Pfu DNA polymerase . 
30 1 M each primer (pRAMtailup, pRAMtaildn) 

2mMMgCl2 

0.2 mM dNTPs (each) 
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Cycling conditions: (94°C for 20 seconds, 65**C for 60 
seconds, 72°C for 3 minutes) x 30 cycles. 

_ rpj^^ assembled and amplified genes were-subcloned into the pRAM 

vector and expressed in E. coli, yielding 1-2% luminescent GR or RD clones. 
5 Five GR and five RD clones were isolated and analyzed fiirther. Of the five GR 
clones, three had the correct insert size, of v^hich one was wealdy luminescent 
and one had an altered restriction pattern. Of the five RD clones, two had the 
correct size insert with an altered restriction pattem and one of those was weakly 
luminescent. Overall, the analysis indicated the presence of a large number of 
10 mutations in the genes, most likely the result of errors introduced in the 
assembly and amplification reactions. 

c) Corrective assembly and amplification 

To remove the large nimiber of mutations present in the fiiU-length 
15 synthetic genes we performed an additional assembly and amplification reaction 
for each gene using the proof-reading DNA polymerase Tli, The assembly 
reaction contained, in addition to the 98 GR or RD oligonucleotides, a small 
amount of DNA from the corresponding full-length clones with mutations 
described above. This allows the oUgos to correct mutations present in the 
20 templates. 

The following assembly reaction was performed for each of the synthetic 
genes. The total volume for each reaction was 50 |j.l: 

0.5 jiM oUgonucleotides (= 0.25 pmoles of each oligo) 
0.016 pmol plasmid (mix of clones with correct insert 
25 size) 

2.5 U Tli DNA polymerase 
2 mM MgCl2 
0.2 mM dNTPs (each) 
0.1% gelatin 

30 Cycling conditions: 94°C for 30 seconds, then (94°C for 

30 seconds, 52*^0 for 30 seconds, 72°C for 30 seconds) for 
55 cycles, then 72°C for 5 minutes. 
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The following amplification reaction was performed on each of the 
assembly reactions. The total volume for each' amplification reaction was 50 \xl: 

. . 1-5 ^1 of assembly reaction. _ 

40 pmol each primer (pRAMtailup, pRAMtaildn) 
5 2.5 U Tli DNA polymerase 

2 mM MgCl2 
0.2 mM dNTPs (each) 

Cycling conditions: 94°C for 30 seconds, then (94°C for 
20 seconds, 65 °C for 60 seconds and 72^C for 3 minutes) 
10 for 30 cycles, then 72*^0 for 5 minutes. 

The genes obtained from the corrective assembly and amplification step 
were subcloned into the pRAM vector and expressed in E, colU yielding 75% 
luminescent GR or RD clones. Forty-four GR and 44 RD clones were analyzed 
with our screening robot (W099/14336). The six best GR and RD clones were 
15 manually analyzed and one best GR and RD clone was selected (GR6 and RD7). 
Sequence analysis of GR6 revealed two point mutations in the coding region, 
both of which resulted in an amino acid substitution (S49N and P230S). 
Sequence analysis of RD7 revealed three point mutations in the coding region, 
one of which resulted in an amino acid substitution (H36Y). It was confirmed 
20 that none of the silent point mutations introduced any regulatory or restriction 
sites conflicting with the overall design criteria for the synthetic genes. 

d) Reversal of unintended amino acid substitutions 

The unintended amino acid substitutions present in the GR6 and RD7 
25 synthetic genes were reversed by site-directed mutagenesis to match the GRverS 
and RDverS designed sequences, thereby creating GRverS. 1 and RDverS.l. The 
DNA sequences of the mutated regions were confirmed by sequence analysis. 



30 



e) Improve spectral properties 

The RDverS. 1 gene was further modified to improve its spectral 
properties by introducing an amino change (R351G), thereby creating RDverS .2 
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pGL3 vectors with RD and GR genes 

The parent click beetle luciferase YG#81-6G1 ("YG"), and the synthetic 
cUck-beetleJuciferase genes GRver5J-(J'GR''),RD-ver5.^ 

1H9 were cloned into the four pGL3 reporter vectors (Promega Corp.): 
5 - pGL3-Basic = no promoter, no enhancer 

- pGL3-Control = SV40 promoter, SV40 enhancer 

- pGL3 -Enhancer = SV40 enhancer (3' to luciferase coding sequences) 
pGL3 -Promoter = SV40 promoter. 

The primers employed in the assembly of GR and RD synthetic genes facilitated 
10 the cloning of those genes into pRAM vectors. To introduce the genes into 

pGL3 vectors (Promega Corp., Madison, WI) for analysis in mammalian cells, 
each gene in a pRAM vector (pRAM RDverS.l, pRAM GRverS.l, and pRAM 
RD156-1H9) was amplified to introduce an Nco I site at the 5' end and an wX&a I 
site at the 3' end of the gene. The primers for pRAM RDverS.l and pRAM 
15 GRverS.l were: 

GR-^5' GGA TCC CAT GGT GAA GCG TGA GAA 3' (SEQ ID NO:231) or 
RD"»5' GGA TCC CAT GGT GAA ACG CGA 3' (SEQ ID NO:232) and 
S' CTA GCT TTT TTT TCT AGA TAA TCA TGA AGA C 3' (SEQ ID 
NO:233) 

20 The primers for pRAM RD1S6-1H9 were: 

S' GCG TAG CCA TGG TAA AGC GTG AGA AAA ATG TC 3' (SEQ ID NO: 
295) and 

5' CCG ACT CTA GAT TAC TAA CCG CCG GCC TTC ACC 3' (SEQ ID 
NO: 296) 
25 The PCR included: 

lOOngDNAplasmid 
1 ^iM primer upstream 
1 fjM primer downstream 
0.2 mM dNTPs 
30 IX buffer (Promega Corp.) 

5 units Pfu DNA polymerase (Promega Corp .) 
Sterile nanopure H2O to SO [il 
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The cycling parameters were: 94°C for 5 minutes; (94°C for 30 seconds; 
55°C for 1 minute; and 72°C for 3 minutes) x 15 cycles. The purified PGR 
product was digested witiiiVco I and JCba I, Ugated wi1ii pGL3 -control that was_ _ 
also digested with Nco I andXba I, and the ligated products introduced to E. coli. 
5 To insert the luciferase genes into the other pGL3 reporter vectors (basic, 
promoter and enhancer), the pGL3-control vectors containing each of the 
luciferase genes was digested with Nco I and Xba I, Ugated with other pGL3 
vectors that also were digested with Nco I and Xba I, and the ligated products 
introduced to E. coli. Note that the polypeptide encoded by GRverS. 1 and 

1 0 RDverS. 1 (and RDl 56-1H9, see below) nucleic acid sequences in pGLB vectors 
has an amino acid substitution at position 2 to valine as a result of the Nco I site 
at the initiation codon in the oligonucleotide. 

Because of internal Nco I and Xba I sites, the native gene in YG #81- 
6G01 was amplified Scorn a Hind HL site upstream to a Hpa I site downstream of 

15 the coding region and which included flanking sequences foimd in the GR and 
RD clones. The upstream primer (5'-CAA AAA OCT TGG CAT TCC GGT 
ACT GTT GGT AAA GCC ACC ATG GTG AAG CGA GAG- 3'; SEQ ID 
NO:234) and a downstream primer (5'- CAA TTG TTG TTG TTA ACT TGT 
TTA TT -3'; SEQ ID NO:235) were mixed with YG#81-6G01 and ampUfied 

20 using ttie PCR conditions above. The purified PGR product was digested with 
Nco I and Xba I, hgated with pGL3-control that was also digested with Hind HI 
and Hpa I, and the Ugated products ititroduced into E. coli. To insert YG#81- 
6G01 into the other pGL3 reporter vectors (basic, promoter and enhancer), the 
pGL3-control vectors containing YG#81-6G01 were digested with Nco I and 

25 Xba I, ligated with the other pGL3 vectors that also were digested with Nco I and 
Xba I, and the ligated products introduced to E. coli. Note that the clone of 
YG#81-6G01 in the pGL3 vectors has a C instead of an A at base 786, which 
yields a change in the amino acid sequence at residue 262 firom Phe to Leu 
(Figure 2 shows liie sequence of YG#81-6G01 prior to introduction into pGL3 

30 vectors). To determine whether the altered amino acid at position 262 affected 
the enzyme biochemistry, the clone of YG#81-6G01 was mutated to resemble 
the original sequence. Both clones were then tested for expression in jE. coli. 
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physical stability, substrate binding, and luminescence output kinetics. No 
significant differences were found. 

Partially .purified enzynaes expressed fronithe-synthetic^g - 
parent gene were employed to determine Km for luciferin and ATP (see Table 
5 3). 



Table 3 



Enzyme 


Km (LH2) 


Km (ATP) 


YG parent 


2|aM 


17 nM 


GR 


1.3 pM 


25 liM 


RD 


24.5 uM 


46 uM 



In vip'o eukaryotic transcription/translation reactions were also conducted 
10 using Promega's TNT T7 Quick system according to manufacturer's 

instructions. Luminescence levels were 1 to 37-fold and 1 to 77-fold higher 
(depending on the reaction time) for the synthetic GR and RD genes, 
respectively, compared to the parent gene (corrected for luminometer spectral 
sensitivity). 

15 To test whether the S3nithetic click beetle luciferase genes and the wild 

type click beetle gene have improved expression in mammalian cells, each of the 
synthetic genes and the parent gene was cloned into a series of pGL3 vectors and 
introduced into CHO cells (Table 8). In all cases, the synthetic click beetle 
genes exhibited a higher expression than the native gene. Specifically, 

20 expression of the synthetic GR and RD genes was 1900-fold and 40-fold higher, 
respectively, tiian that of the parent (transfection efficiency normalized by 
comparison to native Renilla luciferase gene). Moreover, the data (basic versus 
control vector) show that the synthetic genes have reduced basal level 
transcription. 

25 Further, in experiments with the enhancer vector where the percentage of 

activity in reference to the control is compared between the native and synthetic 
gene, the data showed that the synthetic genes have reduced risk of anomalous 
transcription characteristics. In particular, the parent gene appeared to contain 
one or more intemal transcriptional regulatory sequences that are activated by 
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the enhancer in the vector, and thus is not suitable as a reporter gene while the 
synthetic GR and RD genes showed a clean reporter response (transfection 

_ _efficiency^nonnalized by comparison to na tive Renilla luciferase gene). See 

Table 9. 

5 The clone names and their corresponding SEQ ID numbers for nucleotide 

sequence and amino acid sequence are listed below in Table 4. 

Table 4 

Clone name Luciferase Type SEQ ID NO. SEQ ID NO. 

10 



LUCPPLYG 


Wild type YG Click Beetle 


. 1 


23 


YG#81-6G01 


Mutant YG Click Beetle 


2 


24 


VjXVV X 


Si vntTi f^ti P. rrrp.pn (^linV Rftfttlft 


3 


25 


GRver2 


Synthetic Green Click Beetle 


4 


26 


GRver3 


Synthetic Green Click Beetle 


5 


27 


GRver4 


Synthetic Green Chck Beetle 


6 


28 


GRverS 


Synthetic Green Click Beetle 


7 


29 


GR6 


Synthetic Green Click Beetle 


8 


30 


GRverS.l 


Synthetic Green Click Beetle 


9 


31 


RDverl 


Synthetic Red Chck Beetle 


10 


32 


RDver2 


Synthetic Red Chck Beetle 


11 


33 


RDverS 


Synthetic Red Chck Beetle 


12 


34 


RDver4 


Synthetic Red Chck Beetle 


13 


218 


RDverS 


Synthetic Red Click Beetle 


14 


219 


RD7 


Synthetic Red Chck Beetle 


15 


220 


RDvearS.l 


Synthetic Red Chck Beetle 


16 


221 


RDver5.2 


Synthetic Red Chck Beetle 


17 


222 


RD156-1H9 


Synthetic Red Click Beetle 


18 


223 


RELLUC 


Wild type Renilla 


19 


224 


Rlucverl 


Synthetic Renilla 


20 


225 


Rlucver2 


Synthetic Renilla 


21 


226 
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Rluc-final SynthGtio Renilla 22 227 



Example 2 

5 Evolution of the RD luciferase gene 

RDver5.2 was mutated to increase its luminescence intensity, thereby creating 
RD156-1H9 which carries four additional amino acid changes (M2I, S349T, 
K488T, E538V) and three silent point mutations (SEQ ID NO: 1 8). 

a) Site-directed mutagenesis: 

1 0 The initial strategy was to use site-directed mutagenesis. There are four 

amino acid differences between the GR and RD synthetic genes with H348Q 
providing the greatest contribution to red color. Thus, this substitution may also 
cause structural changes in the protein that could lead to low light output. 
Optimization of positions near this area could increase light output. The 

1 5 following positions were selected for mutagenesis: 

1 . S3 44 (at the edge of the binding pocket for luciferin) — randomize this 
codon. 

2. A245 (strictly conserved but closest to 348 and at the edge of the active 
site pocket) — randomize this codon. 

20 3. 1347 (not conserved, next to 348 in sequence) - mutate to hydrophobic 

amino acids only. 

4. S349 (not conserved, next to 348 in sequence) - mutate to S, T, A, P 
only. 

Oligonucleotides designed to mutate the above positions were used in a 
25 site-directed mutagenesis experiment (W099/14336) and the resulting mutants 
were screened for luminescence intensity. There was little variation in light 
intensity and only about 25% were luminescent. For more detailed analysis, 
clones were picked and analyzed with the screening robot (PCTAV09914336). 
None of the clones had a luminescence intensity (LI) higher than RDver5.2, but 
30 four of the clones had slightly lower composite Km for luciferin and ATP (Km). 

b) Directed evolution: 
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Protocols and procedures used for the directed evolution are detailed in 
see PCTAV09914336. DNA from the four clones with lower Km was combined 
and three libraries of random mutants were produced. The Ubraries were 
screened with the robot and clones witii the highest LI values were selected. 
5 These clones were shuffled together and another robotic screen was completed 
with an incubation temperature of 46°C. The three clones with the highest LI 
values were RD156-0B4, RD156-1A5, and RD156-1H9. 
c) Analysis: 

The three clones with the highest LI values were selected for manual analysis to 
1 0 confirm that their luminescence intensity was higher than that of RDverS .2 and 
to ensure that their spectral properties were not compromised. One of the clones 
was sUghtly green-shifted, all others maintained the spectral properties of 
RDver5.2 (Table 5). 



Tables 

Clone PeakCnm) Width (nm) 



RD156-0B4 


616 


68 


FD156-1A5 


614 


70 


RD156-1H9 


618 


69 


Rdver5.2 (prep #1) 


617 


70 


Rdver5.2 (prep #2) 


618 


69 



15 

The Km values for luciferin and the luminescence intensity relative to 
RDver5.2 were determined for all three clones in several independent 
experiments. All cells samples were processed with CCLR lysis buffer (E14835 
Promega Corp., Madison, WI) and diluted 1: 10 into buffer (25 mM HEPES pH 
20 7.8, 5% glycerol, 1 mg/ml BSA, 150 mM NaCl). Table 7 summarizes the results 
^ (Lum: luminescence values were normalized to optical density; measurements 
for independent experiments are separated by forward slashes) from expression 
in bacterial cells. RDl 56-1H9, the clone with the highest limiinescence intensity 
(5 to 10- fold increase) also has an about 2-fold higher Km for luciferin. 

25 



Clone 



Table 6 

Km Luciferin IpM] Lum (normalized to RDver5.2) 
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RD156-0B4 


8/10 


2.2 / 2.5 


RD156-1A5 


13/13 


3.1/5.6 


RD156-1H9 


20/23/23 


4/10.9/7.5 


-RDver5.2-(prep #1)- 


12 / 14-/14 




RDver5.2 (prep #2) 


40/50 




GRverS.l (prep #1) 


0.5 


64 


GRverS.l (prep #2) 


3 





Table 7 shows a comparison between the liaminescence intensities of 
RD156-1H9, GRverS.l and RDver5.2 normalized to GRverS.l with and without 
correction for the spectral sensitivity of the luminometer photomultipUer tube. 
5 With correction, the luminescence intensity of clone RD1S6-1H9 was only about 
2-fold lower than that of GRverS.l. The luciferin Km for clone RD156-1H9 is 
approximately 40-fold higher than GRverS.l. RD1S6-1H9 is thermostable at 
SO^'C for at least 2 hours. 

10 Table 7 



Name No Correction with Correction 



RI>ver5.2 


0.016 


0.06 


GRverS.l 


1.000 


1.00 


RD156-1H9 


0.116 


0.45 



Tables 8 and 9 show a comparison of luciferase expression levels in CHO 
IS cells. Table 8 shows the expression levels only from the control vectors in 

comparison to the firefly luciferase gene (RLU = relative light units). Table 9 
shows a comparison of the expression levels in all four pGL3 vectors calculated 
as a percent of the expression level in pGL3 -control. 

20 

Table 8 

Synthetic Click Beetle Gene Expression 

Control vector rlu 
YG#81-6G01 177 
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Control vector rlu 
GRverS.l 343,417 
RDverS.l 7,161 

RP156-1H9 20,802 

FireFly 488,016 



Table 9 

Synthetic Click Beetle Gene Expression 



Vector 


Percent of control 




vector 


YG-control 


100 


RD-control 


100 


GR-control 


100 


RD156-1H9 control 


100 


YG-basic 


3.3 


RD-basic 


1.0 


GR-basic 


0.2 


RD156-1H9 basic 


0.3 


YG-promoter 


4.2 


RD-promoter 


15.1 


GR-promoter 


5.7 


RD156-1H9 promoter 


15.5 


YG-enhancer 


51.5 


RD-enhancer 


2.8 


GR-enhancer 


1.4 


RD 1 56- 1H9 enhancer 


0.3 



5 

Example 3 

Synthetic Rejtilla Luciferase Nucleic Acid Molecule 
The synthetic Renilla luciferase genes prepared include 1) an introduced 
10 Kozak sequence, 2) codon usage optiniized for mammalian (human) expression, 
3) a reduction or elimination of unwanted restriction sites, 4) removal of 
prokaryotic regulatory sites (ribosome binding site and TATA box), 5) removal 
of splice sites and poly(A) addition sites, and 6) a reduction or elimination of 
manamalian transcriptional factor binding sequences. 
1 5 The process of computer-assisted design of synthetic Renilla luciferase 

genes by iterative rounds of codon optimization and removal of transcription 
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factor binding sites and other regulatory sites as well as restriction sites can be 
described in three steps: 

1 . Using the wild type Renilla luciferase gene as the parent gene, codon usage _ _ . 
was optimized, one amino acid was changed (T-> A) to generate a Kozak 

5 consensus sequence, and undesired restriction sites were eliminated thereby 

creating synthetic gene Rlucverl . 

2. Remove prokaryotic regulatory sites, splice sites, poly(A) sites and 
transcription factor (TF) binding sites (first pass). Then remove newly 
created TF binding sites. Then remove newly created xmdesired restriction 

10 enzyme sites, prokaryotic regulatory sites, splice sites, and poly(A) sites 

without introducing new TF binding sites. This thereby created Rlucver2 . 

3. Change 3 bases of Rlucver2 thereby creating Rluc-final . 

4. The actual gene was then constructed firom synthetic oUgonucleotides 
corresponding to the Rluc-final designed sequence. All mutations resulting 

15 firom the assembly or PGR process were corrected. This gene is Rluc-final 

(SEQ ID NO;22) and encodes the amino acid sequence of SEQ ID NO:227. 

Codon Selection 

Starting with the Renilla reniformis lucifersise sequence in Genbank 
20 (Accession No. M63501, SEQ ID NO:19), codons were selected based on codon 
usage for optimal expression in himian cells and to avoid E, coli low-usage 
codons. The best codon for expression in human cells (or the best two codons if 
found at a similar frequency) was chosen for all amino acids with more than one 
codon (Wada et al., 1990): 



Arg: 


CGC 


Lys: 


AAG 


Leu: 


CTG 


Asn: 


AAC 


Ser: 


TCT/AGC 


Gin: 


GAG 


Thr: 


ACC 


His: 


CAC 


Pro: 


CCA/CCT 


Glu: 


GAG 


Ala: 


GCC 


Asp: 


GAG 


Gly: 


GGC 


Tyr: 


TAG 


Val: 


GTG 


Cys: 


TGC 
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He: ATC/ATT Phe: TTC 
In cases where two codons were selected for one amino acid, they were 

used in an alternating fashion . To m eet other criteria for the synthetic gene, the 

initial optimal codon selection was modified to some extent later. For example, 
5 introduction of a Kozak sequence required the use of GCT for Ala at amino acid 
position 2 (see below). 

The following low-usage codons in mammahan cells were not used 
unless needed: Arg: CGA, CGU; Leu: CTA,UUA;Ser: TCG;Pro: CCG; 
Val: GTA; and He: ATA. The following low-usage codons in E. coli were also 
10 avoided when reasonable (note that 3 of these match the low-usage hst for 

mammalian cells): Arg: CGA/CGG/AGA/AGG, Leu: CTA;Pro: CCC; He: 
ATA. 

Introduction of Kozak Sequences 

The Kozak sequence: 5' a accATGG CT 3' (SEQ ID NO: 293) (the Nco I 
15 site is xmderlined, the coding region is shown in capital letters) was introduced to 
the synthetic Renilla luciferase gene. The introduction of the Kozak sequence 
changes the second amino acid from Thr to Ala (GCT). 
Removal of undesired restriction sites 

REBASE ver. 808 (updated August 1, 1998; Restriction Enzyme 
20 Database; 

www.neb.com/rebase) was employed to identify undesirable restriction sites as 
described in Example 1 . The following undesired restriction sites (in addition to 
those described in Example 1) were removed according to the process described 
in Example 1 : EcolCBL I, Ndel, Nsil, Sphl, Spel, Xmal, Pstl. 
25 The version of Renilla luciferase (Rluc) which incorporates all these 

changes is Rlucver 1 . 

Removal of nrokarvotic (E. coli) regulatory sequences, splice sites, and polv(A) 
sites 

The priority and process for eliminating transcription regulation sites was 
30 as described in Example 1. 

Removal of TF binding sites 
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The same process, tools, and criteria were used as described in Example 
l^however, the newer version 3 .3 of the TRANSFAC database was employed. 

After removing prokaryotic regulatory sequences, splice sites and — 

poly(A) sites from Rlucverl, the first search for TF binding sites identified about 
5 60 hits. All sites were eliminated with the exception of three that could not be 
removed without altering the amino acid sequence of the synthetic Renilla gene: 

1. site at position 63 composed of two codons for W 
(TGGTGG), for CAC-binding protein T00076; 

2. site at position 522 composed of codons for EMV 
10 (AANATGGTN), for myc-DFl T00517; 

3. site at position 885 composed of codons for EMG 
(GARATGGGN), for myc-DFl T00517. 

The subsequent second search for (newly introduced) TF binding sites yielded 
about 20 hits. All new sites were eliminated, leaving only the three sites 

15 described above. Finally, any newly introduced restriction sites, prokaryotic 
regulatory sequences, splice sites and poly(A) sites were removed without 
introducing new TF binding sites if possible. 

Rlucver2 was obtained (SEQ ID Nos. 21 and 226). 

As in Example 1, lower stringency search parameters were specified for 

20 the TESS filtered string search to fiirther evaluate the synthetic Renilla gene. 

With the LLH reduced from 10 to 9 and the rniiiimum element length 
reduced from 5 to 4, the TESS filtered string search did not show any new hits. 
When, in addition to the parameter changes listed above, the organism 
classification was expanded from "mammalia" to "chordata", the search yielded 

25 only four more TF bindmg sites. When the Min LLH was fiirther reduced to 
between 8 and 0, the search showed two additional 5-base sites (MAMAG and 
CTKTK) which combined had four matches in Rlucver2, as well as several 4- 
base sites. Also as in Example 1, Rlucver2 was checked for hits to entries in the 
EPD (Eukaryotic Promoter Database, Release 45). Three hits were determined 

30 (one to Mus musculus promoter H-2L"d (Cell. 44, 261 (1986), one to Herpes 
Simplex Virus type 1 promoter b'g'2.7 kb, and one to Homo sapiens DHFR 
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promoter f J. MoL BioL, 176. 169 (1984)). However, no further changes were 
made to Rlucver2. 



Summary of Properties for Rlucver2 
5 - All 30 low usage codons were eliminated. The introduction of a Kozak 
sequence changed the second amino acid from Thr to Ala; 
base composition: 55.7% GC {Renilla wild-type parent gene: 36.5%); 
one undesired restriction site could not be eliminated: EcdK V at position 
488; 

10 - the synthetic gene had no prokaryotic promoter sequence but one 

potentially functional ribosome binding site (RBS) at positions 867-73 
(about 13 bases upstream of a Met codon ) could not be eliminated; 
all poly(A) addition sites were eliminated; 

splice sites: 2 donor splice sites could not be eliminated (both share the 
15 amino acid sequence MGK); 

TF sites: allsites with a consensus of >4 unambiguous bases were 
eliminated (about 280 TF binding sites were removed) with 3 exceptions 
due to the preference to avoid changes to the amino acid sequence. 
Synthetic Renilla luciferase sequences are shown in Figures 7 and 8. A codon 
20 usage comparison is shown iu Figure 9. 

When introduced into pGL3, Rluc-Jfinal has a Kozak sequence 
(CACCATGGCT). The changes in Rluc-j5nal relative to Rlucver2 were 
introduced during gene assembly. One change was at position 619, a C to an A, 
which eliminated a eukaryotic promoter sequence and reduced the stability of a 
25 hairpin structure in the corresponding oligonucleotide employed to assemble the 
gene. Other changes included a change from CGC to AGA at positions 218-220 
(resulted in a better oligonucleotide for PGR). 

Gene Assembly Strategy 
30 The gene assembly protocol employed for the synthetic Renilla luciferase 

was similar to that described in Example 1. The oUgonucleotides employed are 
shown in Figure 10. 
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Sense Strand primer: 

- - 5' AACCATGGCTTCCAAGGTGTAGGACGCGGAGCAAGGGAAA-S' (SEQ 

ID NO:236) 
5 Anti-sense Strand primer: 

5' GCTCTAGAATTACTGCTCGTTCTTCAGCACGCGCTCCACG 3' (SEQ 

ID NO:237) 

The resulting synthetic gene jfragment was cloned into a pRAM vector 
using Nco I and Xba I. Two clones having the correct size insert were 
10 sequenced. Four to . six mutations were found in the synthetic gene from each 
clone. These mutations were fixed by site-directed mutagenesis (Gene Editor 
from Promega Corp., Madison, WI) and swapping the correct regions between 
these two genes. The corrected gene was confirmed by sequencing. 

15 Other Vectors 

To prepare an expression vector for the sjmthetic Renilla luciferase gene 

in a pGL-3 control vector backbone, 5 jug of pGL3-control was digested with 

Nco I and^a I in 50 |li1 final volume with 2 |j.l of each enzyme and 5 ^1 lOX 

buffer B (nanopure water was used to fill the volume to 50 |al). The digestion 
20 reaction was incubated at 37°C for 2 hours, and the whole mixture was run on a 

1% agarose gel in IXTAE. The desired vector backbone fragment was purified 

using Qiagen's QIAquick gel extraction kit. 

The native Renilla luciferase gene fragment was cloned into pGL3- 

control vector using two oligonucleotides, Nco I-RL-F and I-RL-R, to PGR 
25 amplify native Renilla luciferase gene using pRL-CMV as the template. The 

sequence for Nco I-RL-F is 5'- 

CGCTAGGCATGGCTTCGAAAGTTTATGATCC -3' (SEQ ID NO:238); the 
sequence for Xba I-RL-R is 

5' GGGGAGTAAGTCTAGAATTATTGTT-3' (SEQ ID NO:239). The PGR 
30 reaction was carried out as follows: 
Reaction mixture (for 100 |al): 

DNA template (Plasmid) 1 .0 fxl (1 .0 ng/^il final) 
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10 X Rec. Buffer 10.0 \xl (Stratagene Corp.) 

dN^s-(25 niM each) 1 .0-^l-(final.25.0 pM) 

5 

Primer 1 (10 |jM) 2.0 |al (0.2>iM final) 

Primer 2 (10 \jM) 2.0 jal (0.2 inM final) 

1 0 P/u DNA Polymerase 2.0 ^1 (2.5 U/|il, Stratagene Corp.) 

82.0 \il double distilled water 
PGR Reaction: heat 94^C for 2 minutes; (94''C for 20 seconds; 

65°C for 1 minute; 72'=*C for 2 minutes; then 72°C for 5 minutes) x 25 cycles, 

1 5 then incubate on ice. The PCR amplified fragment was cut firom a gel, and the 
DNA purified and stored at -20°C. 

To introduce native Renilla luciferase gene fi-agment into pGL3-control 
vector, 5 \xg of the PCR product of the native Renilla luciferase gene (RAM-RL- 
synthetic) was digested with Nco I andXba I. The desired i?enz7te luciferase 

20 gene firagment was purified and stored at -20°C. 

Then 100 ng of insert and 100 ng of pGL3-control vector backbone were 
digested with restriction enzymes Nco I and Xba I and Ugated together. Then 2 
|il of the ligation mixture was transformed into JM109 competent cells. Eight 
ampicilUn resistance clones were picked and their DNA isolated. DNA firom 

25 each positive clone of pGL3-control-native and pGLB-control-synthetic was 

purified. The correct sequences for the native gene and the synthetic gene in the 
vectors were confirmed by DNA sequencing. 

To determine whether the synthetic Renilla luciferase gene has improved 
expression in mammaUan cells, the gene was cloned into the mammalian 

30 expression vector pGL3-control vector xmder the control of SV40 promoter and 
SV40 early enhancer (Fig. 13A). The native Renilla luciferase gene was also 
cloned into the pGL-3 control vector so that the expression firom synthetic gene 
and the native gene could be compared. The expression vectors were ttien 
transfected into four common mammalian cell lines (CHO, NIH3T3, Hela and 

35 CV-1; Table 10), and the expression levels compared between the vectors with 
the synthetic gene versus the native gene. The amount of DNA used was at two 
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different levels to ascertain that expression j&om the synthetic gene is 
consistently increased at different expression levels. The results show a 70-600 
-fold increase of expression-for the synthetic Renilla luciferase gene in these cells _ 
(Table 10). 

Table 10 

Enhanced Synthetic Renilla Gene Expression 

Cell Type Amoimt Vector Fold Expression Increase 

CHO 0.2 ^ig 142 

2.8 fig 145 

NIH3T3 0.2 iLig 326 

2.0 Kig 593 

HeLa 0.2 |^g 185 

1.0 fxg 103 

CV-1 0.2 jag 68 

2.0 lag 72 



10 One important advantage of luciferase reporter is its short protein half- 

life. The enhanced expression could also result from extended protein half-Ufe 
and, if so, this gives an undesired disadvantage of the new gene. This possibility 
is ruled out by a cycloheximide chase ("CHX Chase") experiment (Figure 14), 
which demonstrated that there was no increase of protein half-life resulted from 

15 the humanized Renilla luciferase gene. 

To ensure that the increase in expression is not limited to one expression 
vector backbone, is promoter specific and/or cell specific, a synthetic Renilla 
gene (Rluc-final) as well as native Renilla gene were cloned into different vector 
backbones and under different promoters (Figure 13B). The synthetic gene 

20 always exhibited increased expression compared to its wild-type coimterpart 
(Table 11). . ■ 



25 



wo 02/16944 



72 



PCT/USOl/26566 



Table 11 

Renilla Gene Expression: native v. synthetic fRluc-final^ 



Vector 


N1H-3T3 


HeLa 


CHO 


pRL-tk, native 


3,834.6 


922.4 


7,671.9 


pRL-tk, synthetic 


13,252.5 


9,040.2 


41,743.5 


pKL-CMV, native 


168,062.2 


842,482.5 


153,539.5 


nRL-CMV svnthetic 


2 168 129 


8 440 306 


2,532,576 


pRL-SV40, native 


224,224.4 


34o,787.0 


85 323 6 


pRL-SV40, synthetic 


1,469,588 


2,632,510 


i,4z2,oJU 


pRL-null, native 


2,853.8 


431.7 


2i434 


pRL-null, synthetic 


9,151.17 


.2,439 


28,317.1 


pRGL3b, native 


12 


21.8 


17 


pRGL3b, synliietic 


130.5 


212.4 


1,094.5 


pRGL3-tk, native 


27.9 


155.5 


186.4 


pRGL3-tk, syntlietic 


6,778.2 


8,782.5 


9,685.9 


pRL-tk no intron, native 


31.8 


165 


93.4 


pRL-tk no intron, synthetic 


. 6,665.5 


6,379 


21,433.1 



Table 12 

5 Renilla Luciferase Expression in Mammahan Cells 

Percent of control vector 



Vector CHO cells NIH3T3 cells HeLa cells 

pRL-control native 100 100 100 

pRL-control synthetic 100 100 100 

pRL-basic native 4.1 5.6 0.2 

pRL-basic synthetic 0.4 .0.1 0.0 

pRL-promoter native 5.9 7.8 0.6 

pRL-promoter synthetic 15.0 9.9 1.1 
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Percent of control vector 



pRL-enhancer native 



42.1 



123.9 



52.7 



pRL-enhaacer syntfietic 



"2.6 



1.5- - 



5.4- - 



(Vector backbones illustrated in Figure 13 A) 

With reduced spurious expression the synthetic gene should exhibit less 
basal level transcription in a promoterless vector. The synthetic and native 
5 Renilla luciferase genes were cloned into the pGL3-basic vector to compare tlie 
basal level of transcription. Because the synthetic gene itself has increased 
expression efficiency, the activity from the promoterless vector cannot be 
compared directly to judge the difference in basal transcription, rather, this is 
taken into consideration by comparing the percentage of activity jfrom the 
10 promoterless vector in reference to the control vector (expression from the basic 
vector divided by the expression in the fully functional expression vector with 
both promoter and enhancer elements). The data demonstrate that the synthetic 
Renilla luciferase has a lower level of basal transcription than the native gene 



substantially stimulate promoter activity. To test whether the synthetic gene has 
reduced risk of inappropriate transcriptional characteristics, the native and 
synthetic gene were introduced into a vector with an enhancer element (pGL3- 
enhancer vector). Because the synthetic gene has higher expression efficiency, 

20 the activity of both cannot be compared directly to compare the level of 

transcription in the presence of the enhancer, however, this is taken into account 
by using the percentage of activity from enhancer vector in reference to the 
control vector (expression in the presence of enhancer divided by the expression 
in the fully ftmctional expression vector with both promoter and enJiancer 

25 elements). Such results show that when native gene is present, the enhancer 
alone is able to stimulate transcription from 42-124% of the control, however, 
when the native gene is replaced by the synthetic gene in the same vector, the 
activity only constitutes 1-5% of the value when the same enhancer and a strong 



(Table 12) 



15 



It is well known to those skilled in the art that an enhancer can 
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SV40 promoter axe employed. This clearly demonstrates that synthetic gene has 
reduced risk of spurious expression (Table 12). ^ 

-The synthetie~i2em7te gene (RJuc-fi^ 

compare translation efficiency with the native gene. In a T7 quick coupled 
5 transcription/translation system (Promega Corp., Madison, WI), pRL-nuU native 
plasmid (having the native Renilla luciferase gene under the control of the T7 
promoter) or the same amount of pRL-nuU- synthetic plasmid (having the 
synthetic Rejiilla luciferase gene under the control of the T7 promoter) was 
added to the TNT reaction mixture and luciferase activity measured every 

10 5 minutes up to 60 minutes. Dual Luciferase assay kit (Promega Corp.) was 
used to measure Renilla luciferase activity. The data showed that improved 
expression was obtained from the synthetic gene (Figure 15A,B). To further 
evidence the increased translation efficiency of the synthetic gene, RNA was 
prepared by an in vitro transcription system, then purified. pRL-nuU (native or 

1 5 synthetic) vectors were linearized with BamH I. The DNA was purified by 

multiple phenol-chloroform extraction followed by ethanol precipitation. An in 
vitro T7 transcription system was employed by prepare RNAs. The DNA 
template was removed by using RNase-free DNase, and RNA was purified by 
phenol-chloroform extraction followed by multiple isopropanol precipitations. 

20 The same amount of purified RNA, either for the synthetic gene or the native 
gene, was then added to a rabbit reticulocyte lysate (Figure 15 C, D) or wheat 
germ lysate (Figure 15 E, F). Again, the synthetic Renilla luciferase gene RNA 
produced more luciferase than the native one. These data suggest that the 
translation efficiency is improved by the synthetic sequence. To determine why 

25 the synthetic gene was highly expressed in wheat germ, plant codon usage was 
determined. The lowest usage codons in higher plants coincided with those in 
mammals. 

Reporter gene assays are widely* used to study transcriptional regulation 
events. This is often carried out in co-transfection experiments, in which, along 
30 with the primary reporter construct containing the testing promoter, a second 
control reporter under a constitutive promoter is transfected into cells as an 
internal control to normalize experimental variations including transfection 
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efficiencies between the samples. Control reporter signal, potential promoter 
cross talk between the control reporter and primary reporter, as well as potential 

regulation of the control reporter by experimental conditions, are important 

aspects to consider for selecting a reliable co-reporter vector. 
5 As described above, vector constructs were made by cloning synthetic 

Renilla luciferase gene into different vector backbones under different 
promoters. All the constructs showed higher expression in the three mammaUan 
cell lines tested (Table 11). Thus, with better expression efficiency, the synthetic 
Renilla luciferase gives out higher signal when transfected into mammahan cells. 

10 Because a higher signal is obtained, less promoter activity is required to 

achieve the same reporter signal, this reduced risk of promoter interference. 
CHO cells were transfected with 50 ng pGL3-control (firefly luc-^) plus one of 5 
different amounts of native pRL-TKplasmid (50, 100, 500, 1000, or 2000 ng) or 
synthetic pRL-TK (5, 10, 50, 100, or 200 ng). To each transfection, pUC19 

1 5 carrier DNA was added to a total of 3 jig DNA. Shown in Figure 1 6 is the 

experiment demonstrating that 10 fold less pRL-TK DNA gives similar or more 
signal as the native gene, with reduced risk of inhibitiag expression firom the 
primary reporter pGLS -control. 

Experimental treatment sometimes may activate cryptic sites within the 

20 gene and cause induction or. suppression of the co-reporter expression, which 
woidd compromise its function as co-reporter for normalization of transfection 
efficiencies. One example is that TPA induces expression of co-reporter vectors 
harboring the wild-type gene when transfecting MCF-7 cells. 500 ng pRL-TK 
(native), 5 p.g native and synthetic pRG-B, 2.5 \ig native and synthetic pRG-TK 

25 were transfected per well of MCF-7 cells. 100 ng/well pGL3-control (firefly 

luc+) was co-transfected with all RL plasmids. Carrier DNA, pUC19, was used 
to bring the total DNA transfected to 5.1 ng/well. 15.3 jliI TransFast Transfection 
Reagent (Promega Corp., Madison, WI) was added per well. Sixteen hours later, 
cells were trypsinized, pooled and split into six wells of a 6-well dish and 

30 allowed to attach to the well for 8 hours. Three wells were then treated with the 
0.2 nM of the tumor promoter, TPA (phorbol-12-myristate-13-acetate, 
Calbiochem #524400-8), and three wells were mock treated with 20 ^1 DMSO. 
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Cells were harvested with 0.4 ml Passive Lysis Buffer 24 hours post TPA 
addition. The results showed that by using the synthetic gene, undesirable 

change- of-COi?reporter^expression by: exT)erimental stiinu li can be avoided (Table 

13). This demonstrates that using synthetic gene can reduce the risk of . 

5 anomalous expression. 

Table 13 

TPA Induction 



Vector 


"Din 

ivlU 


JrOlQ X 


pRL-tk untreated (native) 


184 




pRL-tk TPA treated (native) 


812 


4.4 


pRG-B untreated (native) 


1 




pRG-B TPA treated (native) 


8 


8.0 


pRG-B untreated (final) 


132 




pRG-B TPA treated (final) 


195 


1.47 


pP»G-tk imtreated (native) 


44 




pRG4k TPA treated (native) 


192 


4.36 


pRG-tk untreated (final) 


12,816 




pRG-tk TPA treated (final) 


11,347 


0.88 
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All pubUcations, patents and patent applications are incorporated herein 
by reference. While in the foregoing specification, this invention has been 
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described in relation to certain preferred embodiments thereof, and many details 
have been set forth for purposes of illustration, it will be apparent to those skilled 

in the art fliat the invention is susceptible to additional embodiments a nd that 

certain of the details herein may be varied considerably without departing from 
5 the basic principles of the invention. 
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WHAT IS CLAIMED IS: 

1 . — A synthetic nucleic acid-molecule CGmpristiig-at least 300 nucleotides of- 

a coding region for a polypeptide, having a codon con[iposition differing 
at more than 25% of the codons from a wild type nucleic acid sequence 
encoding a polypeptide, and having at least 3-fold fewer transcription 
regulatory sequences relative to the average nimiber of such sequences 
resulting from random selections of codons at the codons which differ, 
wherein the transcription regulatory sequences are selected from the 
group consisting of transcription factor binding sequences, intron splice 
sites, poly(A) addition sites and promoter sequences, and wherein the 
polypeptide encoded by the synthetic nucleic acid molecule has at least 
85% sequence identity to the polypeptide encoded by the wild type 
nucleic acid sequence. 

2. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has at least 5 -fold fewer transcription regulatory 
sequences. 

3. The synthetic nucleic acid molecule of claim 1 wherein the codon 
composition of the synthetic nucleic acid molecule differs from the wild 
type nucleic acid sequence at more than 35% of the codons. 

4. The synthetic nucleic acid molecule of claim 1 wherein the codon 
composition of the synthetic nucleic acid molecule differs from the wild 
type nucleic acid sequence at more than 45% of the codons. 

5. The synthetic nucleic acid molecule of claim 1 wherein the codon 
composition of the synthetic nucleic acid molecule differs from the wild 
type nucleic acid sequence at more than 55% of the codons. 
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6, The synthetic nucleic acid molecule of claim 1 wherein the majority of 
codons wliich differ are ones that are preferred codons of a desired host 
cell. 



7. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule encodes a reporter molecule. 

8. . The synthetic nucleic acid molecule of claim 1 wherein the synthetic 

nucleic acid molecule encodes a selectable marker protein. 

9. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule encodes a luciferase. 

10. The synthetic nucleic acid molecule of claim 9 wherein the wild type 
nucleic acid sequence encodes a Renilla luciferase. * 

1 1 . The synthetic nucleic acid molecule of claim 9 wherein the wild type 
nucleic acid sequence encodes a beetle luciferase. 

12. The synthetic nucleic acid molecule of claim 1 1 wherein the synthetic 
nucleic acid molecule encodes the amino acid valine at position 224. 

13. The synthetic nucleic acid molecule of claim 1 1 wherein the synthetic 
nucleic acid molecule encodes the amino acid histidine at position 224, 
histidiue at position 247, isoleucine at position 346, glutamine at position 
348, or any combination thereof. 



14. 



The synthetic nucleic acid molecule of claim 1 wherein the majority of 
codons which differ ia the synthetic nucleic acid molecule are those 
which are employed more frequently in mammals. 
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15. The synthetic nucleic acid molecule of claini 1 wherein the majority of 
codons which differ in the synthetic nucleic acid molecule are those 

which are preferred_codons in hu^ _ _ _ __ . 

16. The synthetic nucleic acid molecule of claim 1 wherein the majority of 
codons which differ in the synthetic nucleic acid molecule are those 
which are preferred codons in plants. 

17. The synthetic nucleic acid molecule of claim 9 wherein the synthetic 
nucleic acid molecule comprises SEQ ID NO:21 (RIucver2) or SEQ ID 
NO:22 (Rluc-final). 

18. The synthetic nucleic acid molecule of claim 9 wherein tiie synthetic 
nucleic acid molecule comprises SEQ ID NO:7 (GRverS), SEQ ID NO:8 
(GRver6), SEQ ID NO:9 (GRverS. 1), or SEQ ID NO:297 (GRverS. 1). 

19. The synthetic nucleic acid molecule of claim 9 wherein the synthetic 
nucleic acid molecule comprises SEQ ID NO: 14 (RDver5), SEQ ID 
NO: 1 5 (RDver7), SEQ ID NO: 1 6 (RDver5. 1), SEQ ID NO:299 
(RDverS.l), SEQ ID NO:17 (RDver5.2), SEQ ID NO:18 (RD156-1H9) 
or SEQ ID NO:301 (RD156-1H9). 

20. The synthetic nucleic acid molecule of claim 15 wherein the majority of 
codons which differ are the human codons CGC, CTG, TCT, AGC, 
ACC, CCA, CCT, GCC, GGC, GTG, ATC, ATT, AAG, AAC, CAG, 
CAC, GAG, GAC, TAG, TGC and TTC. 

21. The synthetic nucleic acid molecule of claim 15 wherein the majority of 
codons which differ are the human codons CGC, CTG, TCT, ACC, 
CCA, GCC, GGC, GTC, and ATC or codons CGT, TTG, AGC, ACT, 
CCT, GCT, GGT, GTG and ATT. 
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22. The synthetic nucleic acid molecule of claim 16 wherein the majority of 
codons which differ are the plant codons CGC, CTT, TCT, TCC, ACC, 
CCA, CCT, GCT, GGA, GTG, ATC,^TT, AAG, AAC,_CAA^AC, _ 
GAG, GAC, TAG, TGC and TTC. 

23. The synthetic nucleic acid molecule of claim 16 wherein the majority of 
codons which differ are the plant codons CGC, CTT, TOT, ACC, CCA, 
GTC, GGA, GTC, and ATC or codons CGT, TGG, AGC, ACT, CCT, 
GCC, GGT, GTG and ATT. 

24. The synthetic nucleic acid molecule .of claim 1 wherein the synthetic 
nucleic acid moleciile is expressed in a mammalian host cell at a level 
which is greater than that of the wild type nucleic acid sequence. 

25. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has an increased number of CTG or TTG leucine- 
encoding codons, 

26. The synthetic nucleic acid molecule of claim 1 wherein- the synthetic 
nucleic acid molecule has an increased number of GTG or GTC valine- 
encoding codons. 

27. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has an increased number of GGC or GGT glycine- 
encoding codons. 

28. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule an increased number of ATC or ATT isoleucine- 
encoding codons. 
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29. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has an increased number of CCA or CCT proline- 

encoding codons. _ , . . _ . 

30. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has an increased number of CGC or CGT arginine- 
encoding codons. 

3 1 . The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has an increased number of AGC or TCT serine- 
encodiag codons. 

32. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has an increased number of ACC or ACT 
threonine-encoding codons. 

33. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has an increased number of GCC or OCT alanine- 
encoding codons. 

34. The synthetic nucleic acid molecule of claim 1 wherein the codons in the 
synthetic nucleic acid molecule which differ encode the same amino 
acids as the corresponding codons in the wild type nucleic acid sequence. 

35. A plasmid comprising the synthetic nucleic acid molecule of claim 1 . 

36. An expression vector comprising the synthetic nucleic acid molecule of 
claim 1 linked to a promoter functional in a cell. 

37. The expression vector of claim 36 wherein the synthetic nucleic acid 
molecule is operatively linked to a Kozak consensus sequence. 
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38. The expression vector of claim 36 wherein the promoter is functional in a 
mammalian cell. 

39. The expression vector of claim 36 wherein the promoter is functional in a 
human cell. 

40. The expression vector of claim 36 wherein the promoter is functional in a 
plant cell. 

41. The expression vector of claim 36 wherein the expression vector further 
comprises a multiple cloning site. 

42. The expression vector of claim 41 wherem the expression vector 
comprises a multiple cloning site positioned between the promoter and 
the synthetic nucleic acid molecule. 

43. The expression vector of claim 41 wherein the expression vector 
comprises a multiple cloning site positioned downstream from the 
synthetic nucleic acid molecule. 

44. A host cell comprising the expression vector of claim 36. 

45: A reporter gene e3q)ression kit comprising, in suitable container means, 
the expression vector of claim 36. 

46. An isolated polypeptide encoded by SEQ ID N0:9 (GRverS.l) or SEQ 
IDNO:18(RD156-lH9). 

47. A polynucleotide which hybridizes xmder stringent hybridization 
conditions to SEQ ID NO:22 (Rluc-final), SEQ ID NO:9 (GRverS.l), 
SEQ ID NO:18 (RD156-1H9), SEQ ID NO:297 (GRverS.l), SEQ ID 
NO:301 (RD156-1H9), or the complement thereof. 
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48. A method to prepare a synthetic nucleic acid molecule comprising an 
- open reading frame,- comprising: - 

a) altering a plurality of transcription regulatory sequences in a parent 
nucleic acid sequence which encodes a polypeptide having at least 100 
amino acids to yield a synthetic nucleic acid molecule which has at least 
3 -fold fewer transcription regulatory sequences relative to the parent 
nucleic acid sequence, wherein the transcription regulatory sequences are 
selected from the group consisting of transcription factor binding 
sequences, intron splice sites, poly(A) addition sites, enhancer sequences 
and promoter sequences; and 

b) altering greater than 25% of the codons in the synthetic nucleic acid 
sequence which has a decreased number of transcription regulatory 
sequences to yield a further synthetic nucleic acid molecule, wherein the 
codons which are altered do not result in an increased number of 
transcription regulatory sequences, wherein the further synthetic nucleic 
acid molecule encodes a polypeptide with at least 85% amino acid 
sequence identity to the polypeptide encoded by the parent nucleic acid 
sequence. 

49. A method to prepare a synthetic nucleic acid molecule comprising an 
open reading frame, comprising: 

a) altering greater than 25% of the codons in a parent nucleic acid 
sequence which encodes a polypeptide having at least 100 amino acids to 
yield a codon-altered synthetic nucleic acid molecule, and 

b) altering a pluraUty of transcription regulatory sequences in the codon- 
altered synthetic nucleic acid molecule to yield a further synthetic nucleic 
acid molecule which has at least 3 -fold fewer transcription regulatory 
sequences relative to a synthetic nucleic acid molecule with a random 
selection of codons at the codons which differ, wherein the transcription 
regulatory sequences are selected from the group consisting of 
transcription factor binding sequences, intron splice sites, poly(A) 
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addition sites, enhancer sequences and promoter sequences, and wherein 
the further synthetic nucleic acid molecule encodes a polypeptide with at 
least 85% amino acjd sequence identity to 
the parent nucleic acid sequence. 

50. The method of claim 48 or 49 wherein the parent nucleic acid sequence 
encodes a reporter molecule. 

5 1 . The method of claim 48 or 49 wherein the parent nucleic acid sequence 
encodes a luciferase. 

52. The method of claim 48 or 49 wherein the synthetic nucleic acid 
molecule hybridizes under medium stringency hybridization conditions 
to the parent nucleic acid sequence. 

53 . The method of claim 48 or 49 wherein the codons which are altered 
encode the same amino acid as the corresponding codons in the parent 
nucleic acid sequence. 

54. A sjnathetic nucleic acid molecule which is the further synthetic nucleic 
acid molecule prepared by the method of claim 48 or 49. 

55. A method for preparing at least two synthetic nucleic acid molecules 
which are codon distinct versions of a parent nucleic acid sequence which 
encodes a polypeptide, comprising: 

a) altering a parent nucleic acid sequence to yield a synthetic nucleic 
acid molecule having an increased number of a first plurality of codons 
that are employed more frequently in a selected host cell relative to the 
number of those . codons in the parent.nucleic acid sequence; and 

b) altering the parent nucleic acid sequence to yield a further synthetic 
nucleic acid molecule having an increased number of a second pluraUty 
of codons that are employed more frequently in the host cell relative to 
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the number of those codons in the parent nucleic acid sequence, wherein 
the first plurality of codons is different than the second plurality of 

codons,-and~wherein-the-synthetic-aQd-the further- syn&^ - - 

molecules encode the same polypeptide. 

56. The method of claim 55 furtlier comprising altering a plurality of 
transcription regulatory sequences in the synthetic nucleic acid molecule, 
the further synthetic nucleic acid molecule, or both, to yield at least one 
yet further synthetic nucleic acid molecule which has at least 3 -fold fewer 
transcription regulatory sequences relative to the synthetic nucleic acid 
molecule, the further synthetic nucleic acid molecule, or both. 

57. The method of claim 55 further comprising altering at least one codon iu 
the first sjoithetic sequence to yield a first modified synthetic sequence 
which encodes a polypeptide with at least one amino acid substitution 
relative to the polypeptide encoded by the first synthetic nucleic acid 
sequence. 

58. The method of claim 56 further comprising altering at least one codon in 
the second synthetic sequence to yield a second modified synthetic 
sequence which encodes a polypeptide with at least one amino acid 
substitution relative to the polypeptide encoded by the first synthetic 
nucleic acid sequence. 

59. The method of claim 55 wherein the synthetic sequences encode a 
luciferase. 

60. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule is expressed at a level which is at least 1 10% of 
that of the wild type nucleic acid sequence in a cell or cell extract under 
identical conditions. 
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61. The synthetic nucleic acid molecule of claim 1 wherein the polypeptide 
encoded by the synthetic nucleic acid molecule has at least 90% 
contiguous sequence identity to the polypeptide encoded by Ae wild type 
nucleic acid sequence. 

62. The synthetic nucleic acid molecule of claim 1 wherein the polypeptide 
encoded by the synthetic nucleic acid molecule is identical in amino acid 
sequence to the polypeptide encoded by the wild type nucleic acid 
sequence. 

63. A vector comprising a synthetic nucleic acid molecule having at least 3- 
fold fewer transcriptional regulatory sequences relative to a vector 
comprising a parent nucleic acid sequence, wherein the transcription 
regulatory sequences are selected from the group consisting of 
transcription factor binding sequences, intron splice sites, poly(A) 
addition sites and promoter sequences. 

64. The vector of claim 63 wherein the synthetic nucleic acid molecule does 
not encode a polypeptide. 

65. The method of claim 48 or 49 further comprising altering the further 
synthetic nucleic acid molecxile to encode a polypeptide having at least 
one amino acid substitution relative to the polypeptide encoded by the 
parent nucleic acid sequence. 

66. The method of claim 48 or 49 wherein the altering of transcription 
regulatory sequences does not introduce amino acid substitutions to the 
polypeptide encoded by the synthetic nucleic acid molecule. 
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Figure 1 
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Figure 2 
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Figure 2 (c( " t.) 
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Figure 2 (cont.) 
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Figure 2 (cont.) 
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Figure 2 (cont) 
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Figure 2 (cont.) 



GRVER51.SEQ 


C 


G 


C 


A A 


C 


G 


T 


T 


A 


C 


c 


G 


G 


T 


A 


A 


G 


A 


T 


C 


A 


C 


T 


c 


G 


T 


A A 


A 


G A 


G 


T 


T 


G 


C 


T 


G 


A 


1600 


GR6.SEQ 


C 


G 


C 


A A 


C 


G 


T 


T 


A 


C 


c 


G 


G 


T 


A 


A 


G 


A 


T 


c 


A 


c 


T 


c 


G 


T 


A A 


A 


G A 


G 


T 


T 


G 


C 


T 


G 


A 


1600 


GRVER5-.SEQ 


C 


G 


c 


A-A 


e 


G 


T 


T 


A- 


e 


e 


G 


G T 


A 


A 


G 


A 


T 


c 


A 


c 


T 


-e 


G 


T 


A A 


A 


G-A 


G_ 


T 


T 


G 


c- 


-T- 


-G A 


1600 


GRVER4 . SEQ 


C 


G 


c 


A A 


c 


G 


T 


G 


A 


c 


c 


G 


G 


T 


A 


A 


G 


A 


T 


c 


A 


c 


T 


c 


G 


T 


A A 


A 


G A 




It 


T 


G 


C 


T 


G 


A 


1600 


GRVER3.SEQ 


C 


G 


c 


A A 


c 


G 


T 


C 


A 


c 


c 


G 


G 


C 


A 


A 


G 


A 


T 


c 


A 


c 


T 


c 


G 


T 


A A 


A 


G A 


g" 


T 


T 


G 


C 


T 


G 


A 


1600 


GRVER2-SEQ 


C 


G 


c 


A A 


T 


G 


T 


C 


A 


c 


c 


G 


G 


C 


A 


A 


A 


A 


T 


T 


A 


c 


T 


c 


G 


T 


A A 


G 


G A 


G 


T 


T 


G 


c 


T 


G 


A 


1600 


GRVERl . SEQ 


C 


G 


c 


A A 


T 


G 


T 


C 


A 


c 


c 


G 


G 


C 


A 


A A 


A 


T 


T 


A 


c 


T 


c 


G 


T 


A A 


G 


G A 


G 


T 


T 


G 


c 


T 


G 


A 


1600 


YG81-6G1,SEQ A 


G 


GAATGTTA 


c 


A 


G 


G 


T 


A 


A A 


A 


T 


T 


A 


c 


A A G A 


A A 


G 


G A A 


C 


T 


T 


c 


T 


G 


A 


1600 


RDVERl.SEQ 


C 


G 


T 


A A 


C 


G 


T 


G 


A 


c 


T 


G 


G 


T 


A 


A 


G 


A 


T 


C 


A 


c 


C 


c 


G 


C 


A A 


A 


G A A 


C 


T 


G 


T 


T 


G 


A 


1600 


RDVER2.SEQ 


C 


G 


T 


A A 


C. 


G 


T 


G 


A 


c 


T 


G 


G 


T 


A 


A 


G 


A 


T 


C 


A 


c 


C 


c 


G 


C 


A A 


A 


G A A 


C 


T 


G 


T 


T 


G 


a' 


1600 


RDVER3.SEQ 


C 


G 


T 


A A 


T 


G 


T 


G 


A 


c 


T 


G 


G 


T 


A 


A 


A 


A 


T 


T 


A 


c 


C 


c 


G 


c 


A A 


G 


G A 


A 


C 


T 


G 


T 


T 


G 


A 


1600 


RDVER4 • SEQ 


C 


G 


C 


A A 


T 


G 


T 


G 


A 


c 


T 


G 


G 


C 


A 


A A 


A 


T 


T 


A 


c 


C 


c 


G 


c 


A A G 


G A 


G 


C 


T 


G 


T 


T 


G 


A 


1600 


RDVER5.SEQ 


C 


G 


T 


A A 


C 


G 


T 


A 


A 


C A G 


G 


C 


A 


A A 


A 


T 


T 


A 


c 


C 


c 


G 


c 


A A G 


G A 


G 


C 


T 


G T 


T 


G 


A 


1600 


RD7 . SEQ 


C 


G 


T 


A A 


C 


G 


T 


A 


A 


C A G 


G 


c 


A 


A A 


A 


T 


T 


A 


c 


C 


c 


*G 


c 


A A 


G 


G A 


G 


c 


T 


G 


T 


T 


G 


A 


1600 


RDVER51 . SEQ 


C 


G 


T 


A A 


c 


G 


T 


A 


A 


C 


A G 


G 


c 


A 


A A 


A 


T 


T 


A 


c 


C 


c 


G 


c 


A A 


G 


G A 


G 


c 


T 


G 


T 


T 


G. 


A 


1600 


RDVER52 . SEQ 


c 


G 


T 


A A 


c 


G 


T 


A 


A 


c 


A 


G 


G 


c 


A 


A 


A 


A 


T 


T 


A 


c 


c 


c 


G 


c 


A A 


G 


G A 


G 


c 


T 


G 


T 


T 


G 


A 


1600 


RD1561H9.SEQ 


c 


G 


T 


A A 


c 


G 


T 


A 


A 


c 


A 


G 


G 




A 


A 


A 


A 


T 


T 


A 


c 


c 


c 


G 


c 


A A 


G 


G A 


G 


c 


T 


G 


T 


T 


G 


A 


1600 



GRVER51.SEQ 


A G 


C 


A 


A 


C 


T 


C 


C 


T 


C 


G 


A 


A 


A 


A 


A 


G 


C 


T 


G 


G 


C 


G 


G 


C 


1626 


GR6,SEQ 


A G 


C 


A 


A 


C 


T 


C 


C 


T 


C 


G 


A 


A 


A 


A 


A 


G 


C 


T 


G 


G 


c 


G 


G 


C 


1626 


GRVER5 . SEQ 


A G 


C 


A 


A 


c 


T 


C 


c 


T 


c 


G 


A 


A 


A 


A 


A 


G 


C 


T 


G 


G 


c 


G 


G 


C 


1626 


GRVER4 - SEQ 


A 


G 


c 


A 


A 


c_ 


T 


c. 


c 


T 


c 


G 


A 


A 


A 


A 


A 


G 


C 


T 


G 


G 


c 


G 


G 


c 


1626 


GRVER3 , SEQ 


A 


A 


c 


A 


A 


T 


T 


G 


c 


T 


c 


G 


A 


A 


A 


A 


A 


G 


C 


T 


G 


G 


c 


G 


G 


c 


1626 


GRVER2.SEQ 


A 


A 


c 


A G 


T 


T 


G 


c 


T 


G 


G 


A 


A 


A 


A G 


G 


C 


T 


G 


G 


T 


G 


G 


c 


1626 


GRVERl. SEQ 


A 


A 


c 


A 


G 


T 


T 


G 


c 


T 


G 


G 


A 


A 


A 


A 


G 


G 


C 


T_ 


G 


G 


T 


G 


G 


c 


1626 


yG81-6Gl.SEQA 


G 


c 


A 


G 


T 


T 


G 


c 


T 


G 


G 


A G 


A 


A 


G 


G 


C 


G 


G 


G 


A 


G 


G 


T 


1626 


RDVERl . SEQ 


A 


G 


c 


A 


A 


C 


T 


G 


T 


T 


G 


G 


A 


G 


A 


A 


A 


G 


C 


C 


G- 


G 


C 


G 


G 


T 


1626 


RD\^R2 . SEQ 






c 


A 


A 


Q 


T 


G 


T 


T 


G 


G 


A 


G 


A 


A 


A 


G 


C 


c 


G 


G 




G 


G 


T 


1626 


RDVER3 . SEQ 


A 


G 


c 


A 


A 


T 


T 


G 


T 


T 


G 


G 


A G 


A 


A 


G 


G 


c 


c 


G 


G 


c 


G 


G 


T 


1626 


RDVER4.. SEQ 


A 


A 


c 


A 


A 


T 


T 


G 


T 


T 


G 


G 


A G A 


A 


G 


G 


c 


c 


G 


G 


c 


G 


G 


T 


1626 


RDVER5.SEQ 


A 


A 


c 


A 


A 


T 


T 


G 


T 


T 


G 


G 


A G 


A 


A 


G 


G 


c 


c 


G 


G 


c 


G 


G 


T 


1626 


RD7 . SEQ 


A 


A 


c 


A 


A 


T 


T 


G 


T 


T 


G 


G 


A 


G 


A 


A 


G 


G 


c 


c 


G 


G 


c 


G 


G 


T 


1626 


RDVER51.SEQ 


A 


A 


c 


A 


A 


T 


T 


G 


T 


T 


G 


G 


A G 


A 


A 


G 


G 


c 


c 


G 


G 


c 


G 


G 


T 


1626 


RDVER52 . SEQ 


A 


A 


c 


A 


A 


T 


T 


G 


T 


T 


G 


G 


A G 


A 


A G 


G 


c 


c 


G 


G 


c 


G 


G 


T 


1626 


RD1561H9.SEQA 


A 


c 


A 


A 


T 


T 


G 


T 


T 


G 


G 


|tJ 


G A 


A G 


G 


c 


£. 


G 


G 


£. 


G 


G T 


1626 



wo 02/16944 



16/65 



PCT/USOl/26566 



Figure 3 
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Jhigure 3 (cont) 
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Figure 4 Codori Usage Analysis 
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Figure 5 A 

Codon Usage YG#8l-6G0X (yellow-green) 
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Arg 
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AAT 


Asn 


16 


AGT 


Ser 
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AAC 


Asn 
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AGC 


Ser 
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AAA 


Lys 


23 


AGA 


Arg 
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AAG 


Lys 


12 


AGG 


Arg 
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GAT 


Asp 


20 


GGT 


Gly 


16 


GAC 


Asp" 




GGC 


Gly 
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GAA 


Glu 
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GGA 


Gly 
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GAG 


Glu 


12 


GGG 


Gly 


2 
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Figure 5B 



Codon Usage ; GRverl 



TTT 


Phe 


12 


TCT 


Seir 


16 


TAT 


Tv!r 


9 


TGT 


Cys 


5 


TTC 


Phe 


13 


TCC 


Ser . 


0 


TAG 


.Tyr 


10 


.TGC 


Cys 


■6 


TTA 


Leu 


0 


TCA 


Ser 


0 


TAA 


*** ■ 


0 


TGA 




0 


TTG 


Leu 


27 


TCG 


Ser 


0 


TAG 


*** 


0 


TGG 


Trp 


2 


CTT 


Leu 


0 


CCT 


Pro 


14 


CAT 


His 


6 


CGT 


Arg 


13 


CTC 


Leu 


0 


CCC 


Pro 


0 


CAC 


His 


7 


CGC 


Arg 


13 


CTA 


Leu 


0 


CCA 


Pro 


14 


CAA 


Gin 


7 


CGA 


Arg 


0 


CTG 


Leu 


28 


CCG 


Pro 


0 


CAG 


Gin 


7 


CGG 


Arg 


0 


ATT 


He 


19 


ACT 


Thr 


11 


AAT 


Asn 


11 


AGT 


Ser 


0 


ATC 


He 


19 


ACC 


Thr 


11 


AAC 


Asn 


11 


AGC 


Ser 


15 


ATA 


He 


0 


ACA 


Thr 


0 


AAA 


Lys 


17 


AGA 


Arg 


0 


ATG 


Met. 


11 


ACG 


Thr 


0 


AAG 


Lys 


18 


AGG 


Arg 


0 


GTT 


Val 


0 


OCT 


Ala 


18 


GAT 


Asp 


13 


GGT 


Gly 


19 


GTC 


Val 


25 


GCC 


Ala 


19 


GAC 


Asp 


13 


GGC 


Gly 


20 


GTA 


Val 


0 


GCA 


Ala 
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GAA 


Glu 


19 


GGA 


Gly 
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GTG 


Val 


25 


GCG 


Ala 
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GAG 


Glu 


19 


GGG 


Gly 


0 
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Figure 5C 



Codon Usage: RDverl 



TTT 


Phe 


13 


TCT 


Ser 


15 


TAT 


Tyr 


10 


TGT 


Cys 


6 


TTC 


Phe~ 


12^ 


TCC 


Ser 


0 


TAC 


Tyr 


10 


"~ TGC~ 


Cys 


5 


TTA 


Leu 


0 


TCA 


Ser 
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TAA 


** * 
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TGA 


★ * * 


. 0 


TTG 


Leu 


27 


TCG 


Ser 


0 


TAG 


ieie-k 
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TGG 


Trp 
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CTT 


Leu 


0 


CCT 


Pro 


14 


CAT 


His 
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CGT 


Arg 


13 


CTC 


Leu 
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CCC 


Pro 


0 


CAC 


His 


6 


CGC 


Arg 


13 


CTA 


Leu 
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CCA 


Pro 


14 


CAA 


Gin 
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CGA 


Arg 
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CTG 


Leu 


27 


CCG 


Pro 
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CAG 


Gin 
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CGG 


Arg 
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ATT 


lie 


20 


ACT 


Thr 


11 


AAT 


Asn 


10 


AGT 


Ser 
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ATC 


He 


19 


ACC 


Thr 


11 


AAC 


Asn 


11 


AGO 


Ser 


15 


ATA 


He 


0 


ACA 


Thr 
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AAA 


Lys 


18 


AGA 


Arg 
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ATG 


Met 


11 


ACG 


Thr 
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AAG 


Lys 


17 


AGG 


Arg 
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GTT 


Val. 
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GCT 


Ala 


19 


GAT 


Asp 


13 


GGT 


Gly 


20 


GTC 


Val 


24 


. GCC 


Ala 


18 


GAC 


Asp 


13 


GGC 


Gly 


19 


GTA 


Val 


0 


GCA 


Ala 
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GAA 


Glu 


19 
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Gly 
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GTG 


Val 


25 


GCG 


Ala 
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GAG 


Glu 


19 


GGG 


Gly 


0 
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Figure 5D 



. Codon Usage: Grver2 



TTT 


Phe 


12 


TCT 


Ser 


15 


TAT 


Tyr 
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TGT 


Cys 
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TTC 


Pne 


13 


TCC 


Ser 
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' TAG 


Tyr 


10 


TGC 


Cys 


, 6 


TTA 


Leu 




TCA 


Ser 
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TAA 


*itft 
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TGA 


* ** 
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TTG 


Leu 


27 


TCG 


Ser 
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TAG 


*** 
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TGG 


Trp 
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CTT 


Leu 
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CCT 


Pro 


14 


CAT 


His 
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CGT 


Arg 


13 


CTC 


Leu 
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ccc 


Pro 
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CAC 


His 
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CGC 


Arg 


13 


CTA 


Leu 
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CCA 


Pro 


14 


CAA 


Gin 


10 


CGA 


Arg 
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CTG 


Leu 


28 


CCG 


Pro 
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CAG 


Gin 
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CGG 


Arg 
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ATT 


lie 


20 


ACT 


Tlir 


11 


AAT 


Asn 


11 


AGT 


Ser 
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ATC 


He 


18 


ACC 


Thr 


11 


AAC 


Asn 


11 


AGC 


Ser 


16 


ATA 


He 
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ACA 


Thr 
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AAA 


Lys 


16 


AGA 


Arg 
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ATG 


Met 


11 . 


ACG 


Thr 
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AAG 


Lys 


19 


AGG 


Arg 
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GTT 


Val 
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GCT 


Ala 


18 


GAT 


Asp 


13 


GGT 


Gly 


.18 


GTC 


Val 


2 8 


GCC 


Ala 


19 


GAC 


Asp 


13 


GGC 


Gly 


21 


iGTA 


Val 
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GCA 


Ala 
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GAA 


Glu 


17 


GGA 


Gly 
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GTG 


Val 


22 


GCG 


Ala 
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GAG 


Glu 


21 


GGG 


Gly 


0 
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Figure 5E 



Codon Usage :Rdver2 



TTT 


Phe 


13 


TCT 


Ser 


16 


TTC 


Phe 


12 


TCC 


Ser 
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TTA 


Xieu 
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TCA 


Ser 
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TTG 


Leu 


27 


TCG 


Ser 
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CTT 


Leu 
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CCT 


Pro 


15 


CTC 


Leu 
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CCC 


Pro 
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Leu 
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CCA 


Pro 


13 


CTG 


Leu 


27 


CCG 


Pro 


0 


ATT 


He 


19 


ACT 


Thr 


11 


ATC 


He 


20 


ACC 


Thr 


11 


ATA 


He 


0 


ACA 


Thr 


0 


ATG 


Met 


11 


ACG 


Thr 
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GTT 


Val 
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OCT 


Ala 


19 


GTC 


Val 


21 


GCC 


Ala 


17 


GTA 


Val 
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GCA 


Ala 
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GTG 


Val 


28 


GCG 


Ala 


0 



TAT 


Tyr 


10 


TGT 
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TAG 


Tyr 


10 
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Cys 
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TAA 
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' TGA 
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TAG 
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TGG 
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CAT 


His 
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Arg 
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His 
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Arg 
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Gin 
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Arg 
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CAG 


Gin 
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Arg 
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AGT 


Ser 
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Ser 
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AAA 


Lya 
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AGA 


Arg 
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AAG 


Lys 


16 


AGG 


Arg 
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GAT 


Asp 


13 


GGT 


Gly 


21 


GAC 


Asp 
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Gly 


18 


GAA 


Glu 
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GGA 


Gly 
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GAG 


Glu 


17 


GGG 


Gly 


0 
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Figure 5F 



Codon Usage: GRver3 . 



TTT 


Phe 
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TAT 
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TTC 


Phe 
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TCC 


Seir 
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TAG 


Tvr 


10 


. TGC 
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lieu 
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Ser 
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TAA 
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TGA 
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TTG 
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TCG 


Ser 
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TAG 


** * 
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TGG 
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CTT 
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CAT 


His 
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CGT 


.Arg 
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CTC 
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Pro 
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CAC 


His 
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Axg 
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CTA 
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Pro 
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9 


CGA 


Arg 
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Pro 
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Arg 
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14 


ACT 


Thr 


14 


AAT 


Asn 


11 


AGT 


Ser 
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ATC 


He 


24 


ACC 


Thr 
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AAC 


Asn 


11 


AGC 


Ser 


15 


ATA 


He 
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ACA 


Thr 
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AAA 


Lys 


21 


AGA 


Arg 
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ATG 


Met 


11 
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AAG 


Lys 


14 


AGG 


Arg 
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GTT 


Val 
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GCT 


Ala 


18 


GAT 


Asp 


12 


GOT 


Gly 


18 


GTC 


Val 


22 


GCC 


Ala 


18 


GAC 


Asp 


14 


GGC 


Gly 


21 


GTA 


Val 
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Ala 
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GAA 


Glu 
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GGA 


Gly 
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GTG 
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Ala 
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Gly 
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Figure 5G 



Codon Usage: RDverS 



TTT 


Phe 


13 


TCT 


Ser 


14 


TAT 


TVr 
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TGT 


Cys 




TTC 


Phe 


12 


TCC 


Ser 
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Tyr 


13 


TGC 
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TTA 


lieu 
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TCA 


Ser 
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TGA 


*** 
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TTG 


Leu 
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TCG 


Ser 
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TAG 


*** 
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TGG 


Trp 
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CTT 
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CGT 


Arg 
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CTC 
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CCC 


Pro 
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CAC 
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Arg 
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CTA 
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CCA 


Pro 
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CAA 
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Arg 
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CCG 


Pro 
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Gin 
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CGG 


Arg 
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20 


ACT 


Thr 


10 


AAT 


Asn 


10 


AGT 


Ser 
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ATC 


He 
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ACC 


Thr 


12 
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Asn 


11 


AGC 


Ser 


15 


ATA 


He 
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ACA 


Thr 
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AAA 


Lys 




AGA 


Arg 
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ATG 


Met 


11 


ACG 
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AAG 


Lys 


22 


AGG 


Arg 
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GTT 


Val 
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GCT- 


Ala 


20 


GAT 


Asp 


14 


GGT 


Gly 


16 


GTC 


Val 


27 


GCC 


Ala 


16 


GAC 


Asp 


12 
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Gly 


23 


GTA 


Val 
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GCA 


Ala 
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GAA 
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Gly 
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Gly 
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E-igure - 5H^ 



Codon Usage : GRver4 



TTT 


Pile 
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TTC 


Fne 
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Tyr 
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Sex 
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TGA 


*** 




TTG 


Leu 


21 


TOG 


Ser 
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TAG 


*** 
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TGG 


Trp 
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CTT 


Leu 
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CCT 


Pro 


18 


CAT 


His 
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CGT 


Arg 
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CTC 


Leu 


11 


CCC 


Pro 
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CAC 
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CGC 


Arg 


11 


CTA 


Leu 
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CCA 


Pro 


10 
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Gin 


11 
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Arg 
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CTG 


Leu 
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CCG 


Pro 
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CAG 
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CGG 


Arg 
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ATT 


lie 
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ACT 


Thr 


14 


AAT 


Asn 


11 


AGT 


Ser 
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ATC 


lie 


25 


ACC 


Thr 
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AAC 


Asn 


11 


AGC 


Ser 


14 


ATA 


Xle 
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ACA 


Thr 
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AAA 


Lys 


20 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 
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AAG 


Lys 


15 


AGG 


Arg 
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GTT 


Val 
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GCT 


Ala 


19 


GAT 


Asp 


12 


GGT 


Gly 


17 


GTC 


Val 


22 


GCC 


Ala 


15 


GAC 


Asp 


• 14 
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Gly 


19 


GTA 


Val 
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GCA 


Ala 
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GAA 
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GGA 


Gly 
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GTG 


Val 
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Ala 
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Gly 
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Figure 51 



Codon Usage: RI>ver4 



TTT 


Phe 


13 


TCT 


Ser 


11 


TAT 


Tyr 


7 


TGT 


Cys 


7 


TTC 


Phe 


12 


TCC 


Ser 


2 


TAC 


Tyr 


13 


TGC 


Cys 


4 


TTA 


Leu 
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TCA 


Ser 
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TAA 


*** 


0 


TGA 


*** 


. 6 


TTG 


Leu 


28 


TCG 


Ser 


0 


TAG 


*** 


0 


TGG 


Trp 


■2 


CTT 


Leu 


0 


CCT 


Pro 


16 


CAT 


His 


11 


CGT 


Arg 


15 


CTC 


Leu 


7 


CCC 


Pro 
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CAC 


His 
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CGC 


Arg 


11 


CTA 


Leu 
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CCA 


Pro 


10 


CAA 


Gin 
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CGA 


Arg 
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CTG 


Leu 


20 


CCG 


Pro 
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CAG 


Gin 
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CGG 


Arg 
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ATT 


lie 


. 21 


ACT 


Thr 


11 


AAT 


Asn 


10 


AGT 


Ser 
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ATC 


lie 


18 


ACC 


Thr 


11 


AAC 


Asn 


11 


AGC 


Ser 


14 


ATA 


He 
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ACA 


Thr 
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AAA. 


Lys 


13 


AX3A 


Arg 
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ATG 


Met 


11 


ACG 


Thr 
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AAG 


Lys 


22 


. AGG 


Arg 
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GTT 


Val 
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GCT 


Ala 


22 


GAT 


Asp 


15 


GGT 


Gly 


14 


GTC 


Val 


27 


GCC 


Ala 


11 


GAC 


Asp 


11 


GGC 


Gly 


21 


GTA 


Val 
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GCA 


Ala 
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GAA 


Glu 


18 


GGA 


Gly 
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GTG 


Val 
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Ala 


0 


GAG 


Glu 
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GGG 


Gly 


0 
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Figure '5 J 



Codon Usage: GRverS 



TTT 


Phe 
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TCT 


Ser 


TTC 


Phe 
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TCC 


Ser 


TTA 


Leu 
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Ser 


TTG 


Leu 


23 


TCG 


Ser 


CTT 


Leu 
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Pro 


CTC 
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12 


CCC 


Pro 


CTA 


Leu 
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CCA 


Pro 


CTG 


Leu 


19 
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Pro 


ATT 


He 
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ACT 
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ATC 


He 


23 


ACC 


Thr 


ATA 


He 
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ACA 


Thr 


ATG 


Met 


11 


ACG 


Thr 


GTT 


Val 
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OCT 


Ala 


GTC 


Val 


21 


GCC. 


Ala 


GTA 


val 
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GCA 


Ala 


GTG 


Val 


25 


GCG 


Ala 



11 


TAT 


Tyr 
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TGT 


Cys 
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*±. 




Tyr 






Cys 
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TAA 
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TGA 
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TAG 


•hick 
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TGG 


Trp 
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17 


CAT 


His 
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CGT 


Arg 


13 
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CAC 


His 
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CGC 


Arg 


11 
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CAA 


Gin 


11 


CGA 


Arg 
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CAG 


Gin 
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CGG 


Arg 
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X4 


AAT 


Asn 
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AGT 


Ser 


1 
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AAC 


Asn 


13 


a:gc 


Ser 


14 


0 


AAA 


Lys 


19 


AGA 


Arg 


0 


0 


AAG 


Lys 


16 


AGG 


Arg 


0 


18 


GAT 


Asp 


12 


GGT 


Gly 


16 


14 


GAC 


Asp 


14 


GGC 


Gly 


21 
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GAA 


Glu 


19 


GGA 


Gly 
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GAG 


Glu 


19 


GGG 


Gly 


1 
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F i-gure -5-K — - - 



Codon Usage: SDverS 



TTT 


Phe 


13 


TCT 


Ser 


12 


TAT 


Tyr 


7 • 


TGT 


Cys 


7* 


TTC 


Phe 


12 


TCC 


Ser 


2 


TAC 


Tyr 


13 


TGC 


Cys 


4 


TTA 


Leu 


0 


TCA 


Ser 


2 


TAA 


**★ 


0 


TGA 


*** 


0 


TTG 


Leu 


25 


TCG 


Ser 


0 


TAG 


*** 


0 


TGG 


Trp 


2 


CTT 


Leu 


1 


CCT 


Pro 


15 


CAT 


His 


9 


CGT 


Arg 


14 


CTC 


Leu 


11 


CCC 


Pro 


1 


CAC 


His 


4 


CGC 


Arg 


12 


CTA 


Leu 


0 


CCA 


Pro 


12 


CAA 


Gin 


7 


CGA 


Arg 


0 


CTG 


Leu 


18 


COG 


Pro 


0 


CAG 


Gin 


8 


CGG 


Arg 


0 


ATT 


He 


19 


ACT 


Thr 


10. 


AAT 


Asn 


9 


AGT 


Ser 


2 


ATC 


He 


20 


ACC 


Thr 


11 


AAC 


Asn 


. 12 


AGC 


Ser 


12 


ATA 


He 


0 


ACA 


Thr 


1 


AAA 


Lys 


13 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


22 


AGG 


Arg 


0 


GTT 


Val 


5 


GCT 


Ala 


21 


GAT 


Asp 


14 


GGT 


Gly 


14 


GTC 


Val 


26 


GCC 


Ala 


12 


GAC 


Asp 


12 


GGC 


Gly 


.21 


GTA 


Val 


1 


GCA 


Ala 


4 


GAA 


Glu 


18 


GGA 


Gly 


3 


GTG 


Val 


17 


GCG 


Ala 


0 


GAG 


Glu 


20 


GGG 


Gly . 


1 
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,Ei.gure 6 



synthetic oligos for engineered GR/RD genes 
(All oligos listed 5 'to 3') 

Coding strand: 5' { )n 3' 

Non-coding strand: 3'^ ( )n ^5' 

Oligos with pRAM flanking sequence identical for 6R/HD 

1) coding strand upstream flanking 

RAM-Cl: ACGCCAGCCCAAGCTTAGGCCTGAGTGGC (SEQ ID NO: 35) 

RAM-C2: CTTAATTCTCCCCATCCCCCTGTTGACAATTAATCATCGGCTCG (SEQ ID NO: 36) 

RAM-C3: TATAATGTGAGGAATTGCGAGCGGATAACAATTTCACACA (SEQ ID NO: 37) 

2) coding strand downstream flanking 

RAM'-C4: ATGGGATGTTACCTAGACCAATATGAAATATTTGGTAAAT (SEQ ID NO: 38) 

RAM-C5: AAATGCTTAATGAATTTCAAAAAAAAAAAA7VAAGGAATTC (SEQ ID NO: 3 9) 

RAM-C6: GATATCAAGCTTATCGATACCGTCGACCTCGAGGATTATA (SEQ ID NO: 40) 

RAM-C7: TAGAAAAAGGCCTCGGCGGCCGCTAGTTCAGTCAGTT (SEQ ID NO: 41) 

3) non-coding strand downstream flanking 

RAM-Nl: AACTGACTGAACTAGCG (SEQ ID NO: 42) 

RAM-N2: GCCGCCGAGGCCTTTTTCTATATAATCCTCGAGGTCGACG (SEQ ID NO: 43) 

RAM-N3: GTATCGATAAGCTTGATATCGAATTCCTTTTTTTTTTTTT (SEQ ID NO: 44) 

RAM--N3b:AGCTTGATATCGAATTCCTTTTTTTTTTTTTTTGAAATTC (SEQ ID N0:45) 

RAM-N4: TTGAi^TTCATTAAGCATTTATTTACCAAATATTTCATAT (SEQ ID NO: 46) 

RAM-N5: TGGTCTACSGTAACIATCCCArCACTAGCTTTTTTTTCTATA (SEQ ID NO: 47) 

4) non-coding strajad upstream flanking 

RAM~N6: TCGCAATTCCTCACATTATACGAGCCGATGATTAATTGTC (SEQ ID NO: 4 8) 
RAM-N7 : AACAGGGGGATGGGGAGAATTAAGGCCACTCAGGCCTAAGCTTGGGCTGGCGT 

(SEQ ID NO: 49) 

GRverB with flanking seq. of pRAH to end of Sfx X primers 

1) Coding strand (Start and stop codons are underlined) 

GR-Cl: GGAAACAGGATCCC ATGATGA AACGCGAAAAGAACQTGAT (SEQ ID NO; 50) 

GR-C2: CTACGGCCCAGAACCACTGCATCCACTGGAAGACCTCACC (SEQ ID NO: 51) 

GR-C3: GCTCGTGAGATGCTCTTCCGAGCACTGCGTAAACATAGTC (SEQ ID NO: 52) 

GR-C4: ACCTCCCTCAAGCACTCGTGGACGTCGTGGGAGACGAGAG * (SEQ ID NO: 53) 

GR'-C5: CCTCTCCTACAAAGAATTTTTCGAAGCTACTGTGCTGTTG (SEQ ID NO: 54) 

GR-C6: GCCCAAAGCCTCCATAATTGTGGGTACAAAATGAACGATG (SEQ ID NO: 55) 

GR-C7: TGGTGAGCATTTGTGCTGAGAATAACACTCGCTTCTTTAT (SEQ ID NO: 56) 

GR-C8: TCCTGTAATCGCTGCTTGGTACATCGGCATGATTGTCGCC (SEQ ID NO: 57) 

GR-C9: CCTGTGAATGAATCTTACATCCCAGATGAGCTGTGTAAGG (SEQ ID NO: 58) 

GR-CIO : TTATGGGTATTAGCAAACCTCAAATCGTCTTTACTACCAA ( SEQ ID NO : 5 9 ) 

GR-C11:AAACATCTTGAATAAGGTCTTGGAAGTCCAGTCTCGTACT (SEQ ID NO: 60) 

GR-C12 :AACTTCATCAAACGCATCATTATTCTGGATACCGTCGAAA (SEQ ID NO: 61) 

GR-CI3 :ACATCCACGGCTGTGAGAGCCTCCCTAACTTCATCTCTCG (SEQ ID N0:62) 

GR-C14:TTACAGC:X5ATGGTAATATCGCTAATTTCAAGCCCTTGCAT (SEQ ID NO: 63) 

GR-C15:TTTGATCCAGTCGAGCAAGTGGCCGCTATTTTG'rGCTCCT (SEQ ID NO: 64) 

GR-C16:CCGGCACCACTGGTTTGCCrAAAGGTGTCATGCAGACTCA (SEQ ID NO: 65) 

GR-C17 :CCAGAATATCTGTGTGCGTTTGATCCACGCTCTCGACCCT (SEQ ID NO: 6 6) 

GR-C18 : CGTGTGGGTACTCAATTGATCCCTGGCGTGACTGTGCTGG (SEQ ID NO : 67 ) 

GR-C19 :TGTATCTGCCTTTCTTTCACGCCTTTGGTTTCTCTATTAC (SEQ ID N0:68) 

GR-C20:CCTGGGCTATTTCATGGTCGGCTTGCGTGTCATCATGTTT (SEQ ID NO: 69) 
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GR-C2 1 : CGTCGCTTCGACCAAGAAGCCTTCTTGAAGGCTATTCAAG 
GR-C22 : ACTACGAGGTGCGTTCCGTGATCAACGTCCCTTCAGTCAT 
GR-C23 :TTTGTTCCTGAGCAAATCTCCTTTGGTTGACAAGTATGATCTG 
GR-C24 : AGCAGCTTGCGTGAGCTGTGCTGTGGCGCTGCTCCTT 
GR-C25 : TGGCCAAAGAAGTGGCCGAGGTCGCTGCTAAGCGTCTGAA 
GR-C26 : CCTCCCTGGTATCCGCTGCGGTTTTGGTTTGACTGAGAGC 
GR-C2 7 : ACTTCTGCTAACATCCATAGCTTGCGAGACGAGTTTAA6T 
GR-C2 8 : CTGGTAGCCTGGGTCGCGTGACTCCTCTTATGGCTGCAAA 
GR-C29 iGATCGCCGACCGTGAGACCGGCAAAGCACTGGGCCCAAAT 
GR-C3 0 : CAAGTCGGTGAATTGTGTATTAAGGGCCCTATGGTCTCTA 
GR-C3 1 : AAGGCTACGTGAACAATGTGGAGGCCACTAAAGAAGCCAT 
GR-C32 :TGATGATGATGGCTGGCTCCATAGCGGCGACTTCGGTTAC . 
GR-C3 3 ; TATGATGAGGACGAACACTTCTATGTGGTCGATCGCTACA 
GR-C34 : AAGAATTGATTAAGTACAAAGGCTCTC7VAGTCGCACCAGC 
GR-C35 : CGAACTGGAAGAAATTTTGCTGAAGAACCCTTGTATCCGC 
GR-C3 6 : GACGTGGCCGTCGTGGGTATCCCAGACTTGGAAGCTGGCG 
GR-C3 7 : AGTTGCCTAGCGCCTTTGTGGTGAAACAACCCGGCAAGGA 
GR-C3 8 : GATCACTGCTAAGGAGGTCTACGACTATTTGGCCGAGCGC 
GR-C3 9 : GTGTCTCACAGCAAATATCTGCGTGGCGGCGTCCGCTTCG 
GR-C40 : TCGATTCTATTCCACGCAACGTTACCGGTAAGATCACTCG 
GR~C4 1 : TAAAGAGTTGCTGAAGCAACTCCTCGAAAAAGCTGGCGGC 
GR-C42 ; TAGTAA AGTCTTCATGATTATATAGAAAAAAAAGCTAGTG 



2) non- coding strand 

GR-Nl: TAATCATGAAGAC TTTACTA GCCGCCAGCTTTTTCGAGGA (SEQ 

GR-lsr2 : GTTGCTTCAGCAACTCTTTACGAGTGATCTTACCGGTAAC (SEQ 

GR-N3 : GTTGCGTGGAATAGAATCGACGAAGCGGACGCCGCCACG (SEQ 

GR-N4: CAGATATTTGGTGTGAGACACGCGCTCGGCCAAATAGTCGT (SEQ 

GR-N5 : AGACCTCCTTAGCAGTGATCTCCTTGCCGGGTTGTTTCAC (SEQ 

GR-N6: CACAAAGGCGCTAGGCAACTCGCCAGCTTCCAAGTCTGGG (SEQ 

GR-N7: ATACCCZACGACGGCCACGTCGCGGATACAAGGGTTCTTCA (SEQ 

GR-N8 : GCAAAATTTCTTCCAGTTCGGCTGGTGCGACTTGAGAGCC (SEQ 

GR-N9 : TTTGTACTTAATCAATTCTTTGTAGCGATCGACCACATAG (SEQ 

GR-NIO :AAGTGTTCGTCCTCATCATAGTAACCGAAGTCGCCGCTAT (SEQ 

GR-Nll : GGAGCCAGCCATCATCAT(:!AAT(3(5CTTCTTTAGTGGCCTC ( SEQ 

GR-N12 : CAC::7lTTGTTC!ACGTAGCC:rrTTAQAGACCATAGGGC^ ( SEQ 

GR-N13 : ATACACAATTCACCGACTTGATTTC3GGCCCAGTGCTTTGC (SEQ 

GRr.N14 : CGGTCTCAC(3GTCGGCC5ATCTTTGCAGCCATAAGAGGAGT . (SEQ 

GR-N15 : CACGCGACCCAGGCTACCAGACTTAAACTCGTCTCGCAAG (SEQ 

GR-N16 : CTATGGATGTTAGCAGAAGTGCTCTCAGTCAAACCAAAAC (SEQ 

GR -N17 : CGCAGCGGATACCL^GGGACSGTTCAGACGCTTAGCAGCGAC ( SEQ 

GR-NIB : CTCGGCCACTTCTTTGGCaUU^(3GAGCAGCGCCACAGCA (SEQ 

GR-N19 : AGCTCACGCAAGCTGCTCAGATCATACTTGTCTU^CCAAAG (SEQ 

GR-N2 0 : GAGATTTGCTCAGGAACAAAATGACTGAAC3GGACGTTGAT ( SEQ 

GR-N2 1 : CACGGAACGCACCTCGTAGTCTTQAATAGCCTTCAA ( SEQ 
GR -N2 2 : GAAGGCTTCTTGGTCGAAGCGACGAAACIATGATGACACGCAAGC ( 

GR-N2 3^: CGACCATGAAATAGCCCAGGGTAATAGAGAAACCAAAGGC { SEQ 

GR-N24 : GTGAAAGAAAGGCAGATACACCAGCACAGTCACGCCAGGG (SEQ 

GR-N25 : ATCAATTGAGTACCCACACGAGGGTCGAGAGCGTGGATCA (SEQ 

GR-N26 : AACGCACACAGATATTCTGGTGAGTCTGCATGACACCTTT (SEQ 

GR-N27 : AGGCAAACCAGTGGTGCCGGAGGAGCAC7VAAATAGCGGCC (SEQ 



SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 



ID NO: 70) 

ID NO: 71) 

ID NO: 72) 

ID NO:73) 

ID NO: 74) 

ID NO: 75) 

ID N0:76) 

ID NO: 77) 

ID N0:78) 

ID NO: 79) 

ID NO:80) 

ID NO: 81) 

ID NO: 82) 

ID NO:83) 

ID N0:84) 

ID NO: 85) 

ID NO: 86) 

ID N0:87) 

ID N0:88) 

ID NO: 89) 

ID NO: 90) 

ID NO: 91) 



ID NO: 
ID NO; 
ID NO: 
ID NO: 
ID NO: 
ID NO: 
ID NO: 
ID NO: 
ID NO; 
ID NO: 
ID NO: 
ID NO: 
ID NO 
ID NO: 
ID NO 
ID NO 
ID NO 
ID NO 
ID NO 
ID NO 
ID NO 
SEQ ID 
ID NO 
ID NO 
ID NO 
ID NO 
ID NO 



92) 
93) 
94) 
95) 
96) 
97) 
98) 
99) 
100) 
101) 
102) 
103) 
104) 
105) 
106) 
107) 
108) 
109) 
110) 
111) 
112) 
NO: 113) 
1 114) 
;115) 
116) 
117) 
118) 
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GR-N28 lACTTGCTCGACTGGATCAAAATGCAAGGGCTTGAAATTAG (SEQ ID NO: 119) 

GR-N2 9 :CGATATTACCATCGCTGTAACGAGAGATGAAGTTAGGGAG (SEQ ID NO: 120) 

GR-N3 0 iGCTCTCACAGCCGTGGATGTTTTCGACGGTATCCAGAATA (SEQ ID NO: 121) 

GR--N31:ATGATGCGTTTGATGAAGTTAGTACGA(3ACT(3GACTTC(::A (SEQ ID NO: 12 2) 

GR-N32 :AGACCTTATTCAAGATGTTTTTGGTAGTAAAGACGATTTG (SEQ ID NO: 123) . 

GR-N33 :AGGTTTGCTAATACCCATAACCTTAC:ACAGCTC:ATCT(3(3G (SEQ ID NO : 124) 

GR-N34:ATGTAAGATTCATT(::A(ZA(3G(3GCGACAATC:A (SEQ ID NO: 12 5) 

GR-N35 :ACCAAGCAGCGATTACAGGAATAAAGAAGCGAGTGTTATT (SEQ. ID NO: 126) 

GR-N36:CTC:AGCACA7^TGCTCACCACATCGTTCATTTTGTACCCA (SEQ ID NO: 12 7) 

GR-N3 7 : CAATTATGGA(3GCTTTG(3GCCAACIAGCACAGTAGCrrTC^ ■ (SEQ ID NO : 12 8 ) 

GR-N3 8 :AAA?ITTCTTTGTAGGAGAGGCTCTCGTCTCCCACGACGTC (SEQ ID NO: 12 9) 

GR-N39:CACGAGTGCTTGAGGGAGGTGACTATGTTTACGCAGTGCT (SEQ ID NO: 130) 

GR-N40:CGGAAC3AGCATCTC:ACC:AGCGGTGA(3GTCTTCCAGTGGAT (SEQ ID NO: 131) 

GR-N41 :GC:AGTGGTTCTGGGCCGTAGATCACGTTCTTTTCGCGTTT (SEQ ID NO: 13 2) 

GR~N42 : CATCAT GGGATCCTGTTTCCTGTGTGAAATTGTTATCCGC (SEQ ID NO: 133) 

RDverS with f laxxklng sequence of pRAM to end of Sfd. I primers 
1) coding strand 

RD-Cl: GGAAACAGGATCCCATGATSAAGCGTGAGAAAAATGTCAT (SEQ ID N0:134) 

RD-C2: CTATGGCCCTGAGCCTCTCCATCCTTTGGAGGATTTGACT (SEQ ID NO: 13 5) 

RD-C3: GCCGGCGAAA.TGCTGTTTCGTGCTCTCCGCAAGCACTCTC (SEQ ID NO: 13 6) 

RD-C4: ATTTGCCTCAAGCCTTGGTCGATGTGGTCGGCGATGAATC (SEQ ID NO: 13 7) 

RD-C5: TTTGAGCTACAAGGAGTTTTTTGAGGCAACCGTCTTGCTG (SEQ ID XTO:13 8) 

RD-C6: GCTCAGTCCCTCCACAATTGTGGCTACAAGATGAACGACG (SEQ ID NO: 13 9) 

RD-C7: TCGTTAGTATCTGTGCTGARAACAATACCCGTTTCTTCAT (SEQ ID NO: 140) 

RD-C8: TCCAGTCATCGCCGCATGGTATATCGGTATGATCGTGGCT (SEQ ID NO: 141) 

RD-C9: CCAGTCAACGAGAGCTACATTCCCGACGAACTGTGTAAAG (SEQ ID NO: 142) 

RD-CIO : TCATGGGTATCTCTAAGCCACAGATTGTCTTCACCACTAA (SEQ ID NO : 14 3 ) 

RD-C11:GAATATTCTGAACAAAGTCCTGGAAGTCCAAAGCCGCACC (SEQ ID NO: 14 4) 

RD-C12 :AACTTTATTAAGCGTATCATCATCTTGGACACTGTGGAGA (SEQ ID NO: 14 5) 

RD-C13 :ATATTCACGGTTGCGAATCTTTGCCTA?^TTTCATCTCTCG (SEQ ID NO: 146) 

RD-C14 :CTATTCAGACGGCAACATCGCAAACTTTAA?VCCACTCCAC (SEQ ID NO: 147) 

RD~C15 iTTCGACCCTGTGGAACAAGTTGCAGCCATTCTGTGTAGCA (SEQ ID NO: 14 8) 

RD-CI62GCGGTACTACTGGACTCCCAAAGGGAGTCATGCAGACCCA (SEQ ID NO: 14 9) 

RD-C17:TCAAAACATTTGCGTGCGTCTGATCCaiTGCTCTCGATCCR. (SEQ ID NO: 150) 

»RD-C18 :CGCTACGGCACTCAGCrGATTCCTGGTGTCACCGTCTTGG (SEQ ID NO: 151) 

RD'-C19 :TCTACTTGCCTTTCTTCCATGCTTTCGGCTTTCATATTAC (SEQ ID NO: 152) 
R£> - C2 0 : TTTGGGTTACTTTATGGTCGGTCTCCGCGTGATTATGTTC (SEQ ID NO : 1 5 3 ) 

RD-C21:CGCCGTTTTGATCAGGAGGCTTTCTTGAAAGCCATCCAAG (SEQ ID NO: 154) 
RD-C22 :A!tTATGAAGTCCGCAGTGTCATCAACGTGCCTAGCGTGAT (SEQ ID NO: 155) 

RD-C23 :CCTGTTTTTGTCTAAGAGCCCACTCGTGGACAAGTACGAC (SEQ ID NO: 156) 

RD-C24:TTGTCTTCACTGCGTGAATTGTGTTGCGGTGCCGCTCCAC .(SEQ ID NO: 157). 
RD-C25 :TGGCTAAGGAGGTCGCTGAAGTGGCCGCCAAACGCTTGAA . (SEQ ID NO: 158) 
RD-C26:TCTTCCAGGGATTCGTTGTGGCTTCGGCCTCACCGAATCT . (SEQ ID NO: 159) 
RD - C2 7 : ACCAGCGCTATTATTCAGTCTCTCCGCGATGAGTTTAAGA (SEQ ID NO : 1 6 0 ) 

RD-C28 :GCGGCTCTTTGGGCCGTGTCACTCCACTeATGGCTGCTAA (SEQ ID NO: 161) 

RD-C29 2GATCGCTGATCGCGAAACTGGTAAGGCTTTGGGCCCTAAC (SEQ ID NO: 162) 

RD-C30 :CAAGTGGGCGAGCTGTGTATCAAAGGCCCTATGGTGAGCA (SEQ ID NO: 163) 

RD-C31:AGGGTTATGTCAATAACGTCGAAGCTACCAAGGAGGCCAT (SEQ ID NO: 164) 

RD-C32 :CGACGACGACGGCTGGTTGCATTCTGGTGATTTTGGATAT (SEQ ID NO: 165) 

RD-C33:TACGACGAAGATGAGCATTTTTACGTCGTGGATCGTTACA (SEQ ID NO: 166) 

RD-C34:AGGAGCTGATCAAATACAAGGGTAGCCAGGTTGCTCCAGC (SEQ ID NO: 167) 

kD-C35:TGAGTTGGAGGAGATTCTGTTGAAAAATCCATGCATTCGC (SEQ ID NO: 168) 
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RD -C3 6 : GATGTCGCTGTGGTCGGCATTCCTGATCTGGAGGCCGGCG 
' RD-C37 lAACTGCCTTCTGCTTTCGTTGTCAAGCAGCCTGGTAAAGA 
RD-C3 8 : AATTACCGCCAAAGAAGTGTATGATTACCTGGCTGAACGT 
RD-C3 9 : GTGAGCCATACTAAGTACTTGCGTGGCGGCGTGCGTTTTG 
RD-C4 0 : TTGACTCCATCCCTCGTAACGTAACAGGCAAAATTACCCG 
RD~C4 1 : CAAGGAGCTGTTGAAACAATTGTTGGAGAAGGCCGGCGGT 
RB-C42 : TAGTAAA GTCTTCATGATTATATAGAAAAAAAAGCTAGTG 



(SEQ ID NO: 169) 
(SEQ ID NO: 170) 
(SEQ ID NO: 171) 
(SEQ ID NO: 172) 
(SEQ ID NO: 173) 
(SEQ ID NO: 174) 
(SEQ ID NO: 175) 



* 2) non- coding strand 



RD- 


-Nl: 


TAATCATGAAGACTTTACTAACCGCCGGCCTTCTCCAACA 


(SEQ 


ID 


NO: 


176) 


RD- 


-N2 : 


ATTGTTTCAACAGCTCCTTGCGGGTAATTTTGCCTGTTAC 


(SEQ 


ID 


NO: 


177) 


RD- 


-N3 : 


GTTACGAGGGATGGAGTCAACAAAACGCACGCCGCCACGC 


(SEQ 


ID 


NO: 


178) 


RD- 


-N4 : 


AAGTACTTAGTATGGCTCACACGTTCAGCCAGGTAATCAT 


(SEQ 


ID 


NO: 


179) 


RD- 


-N5: 


ACACTTCTTTGGCGGTAATTTCTTTACCAGGCTGCTTGAC 


(SEQ 


ID 


NO: 


180) 


RD- 


-N6: 


AACGAAAGCAGAAGGCAGTTCGCCGGCCTCCAGATCAGGA 


(SEQ 


ID 


NO: 


181) 


RD- 


-N7: 


ATGCCGACCACAGCGACATCGCGAATGCATGGATTTTTCA 


(SEQ 


ID 


NO: 


182) 


RD- 


-N8 : 


ACAGAATCTCCTCCAACTCAGCTGGAGCAACCTGGCTACC 


(SEQ 


ID 


NO: 


183) 


RD- 


-N9: 


CTTGTATTTGATCAGGTCCTTGTAACGATCCACGACGTAA 


(SEQ 


ID 


NO: 


184) 


RD- 


-NIO : AAATGCTCATCTTCGTCGTAATATCCAAAATCACCAGAAT 


(SEQ 


ID 


NO: 


185) 


RD- 


-Nil 


: GCAACCAGCCGTCGTCGTCGATGGCCTCCTTGGTAGCTTC 


(SEQ 


ID 


NO: 


186) 


RD- 


-N12 


: GACGTTATTGACATAACCCTTGCTCACCATAGGGCCTTTG 


(SEQ 


ID 


NO: 


187) 


RD- 


-N13 


: ATACACAGCTCGCCCACTTGGTTAGGGCCCAAAGCCTTAC 


(SEQ 


ID 


NO: 


188) 


RD' 


-N14 


:CAGTTTCGCGATCAGCGATCTTAGCAGCCATGAGTGGAGT 


(SEQ 


ID 


NO: 


189) 


RD 


-NIS 


: GACACGGCCCAAAGAGCCGCTCTTAAACTCATCGCGGAGA 


(SEQ 


ID 


NO: 


190) 


RD- 


-N16 


: GACTGAATAATAGCGCTGGTAGATTCGGTGAGGCCGA 


(SEQ 


ID 


NO: 


191) 


RD- 


-N17 


: AGCCACAAC6AATCCCTGGAAGATTCAAGCGTTTGGCGGCCAC ( SEQ 


ID 


NO: 192) 


RD- 


-N18 


: TTCAGCGACCTCCTTAGCCAGTGGAGCGGCACCGCAACAC 


(SEQ 


ID 


NO: 


193) 


RD 


-N19 


: AATTCACGCAGTGAAGACAAGTCGTACTTGTCCACGAGTG 


(SEQ 


ID 


NO: 


194) 


RD 


-N20 


: GGCTCTTAGACAAAAACAGGATCACGCTAGGCACGTTGAT 


(SEQ 


ID 


NO: 


195) 


RD 


-N21 


: GACACTGCGGACTTCATAATCTTGGATGGCTTTCAAGAAA 


(SEQ 


ID 


NO: 


196) 


RD' 


-N22 :GCCTCCTGATCAAAACGGCGGAACATAATCACGCGGAGAC 


(SEQ 


ID 


NO: 


197) 


RD 


-N23 :CGACCATAAAGTAACCCAAAGTAATATGAAAGCCGAAAGC 


(SEQ 


ID 


NO: 


198) 


RD 


-N24 


: ATGGAAGAAAGGCAAGTAGACCAAGACGGTGACACCAGGA 


(SEQ 


ID 


NO: 


199) 


RD 


-N2 5 : ATCAGCTGAGTGCCGTAGCGTGGATCGAGAGCATGGATCA 


(SEQ 


ID 


NO: 


200) 


RD 


-N26 


: GACGCACGCAAATGTTTTGATGGGTCTGCATGACTCCCTT 


(SEQ 


ID 


NO: 


201) 


RD 


-N27 


: TGGGAGTCCAGTAGTACCGCTGCTACACAGAATGGCTGCA 


(SEQ 


ID 


NO: 


202) 


RD 


-N28 


: ACTTGTTCCACAGGGTCGAAGTGGAGTGGTTTAAAGTTTG 


(SEQ 


ID 


NO: 


203) 


RD 


-N29 


: CGATGTTGCCGTCTGAATAGCGAGAGATGAAATTAGGCAA 


(SEQ 


ID 


NO: 


204) 


RD. 


-N30 :AGATTCGCAACCGTGAATATTCTCCACAGTGTCCAAGATG 


(SEQ 


ID 


NO: 


205) 


RD 


-'N31 


: ATGATACGCTTAATAAAGTTGGTGCGGCTTTGGACTTCCA 


(SEQ 


ID 


NO: 


206) 


RD 


-N32 


: GGACTTTGTTCAGAATATTGTTAGTGGTG7UVGACAATCTG 


(SEQ 


ID 


NO: 


207) 


RD 


-N33 


: TGGCTTAGAGATACCCATGACTTTACACAGTTCGTCGGGA 


(SEQ 


ID 


NO: 


208) 


RD 


-N34 


: atgtagctctcgttgactggagcx:acgatcataccgatat 


(SEQ 


ID 


NO: 


209) 


RD 


-N35 


: accatgcggcgatgactggaatgaagaaacgggtattgtt 


(SEQ 


ID 


NO: 


210) 


RD 


-N36 


:TTCAGCACAGATACTAACGACGTCGTTCATCTTGTAGCCA 


(SEQ 


ID 


NO: 


211) 


RD 


-N37 


: CAATTGTGGAGGGACXGAGCCAGCAAGACGGTTGCCTCAA 


(SEQ 


ID 


NO :.212) 


RD 


-N3 8 


: AAAACTCCTTGTAGCTCAAAGATTCATCGCCGACC ACATC 


(SEQ 


ID 


NO: 


213) 


RD 


-N39 


: GACCAAGGCTTGAGGCAAATGAGAGTGCTTGCGGAGAGCA 


(SEQ 


ID 


NO: 


214) 


RD 


-N40 


: CGAAACAGCATTTCGCCGGCAGTC7VAATCCTCCAAAGGAT 


(SEQ 


ID 


NO: 


215) 


RD 


-N41 


: GGAGAGGCTCAGGGCCATAGATGACATTTTTCTCACGCTT 


(SEQ 


ID 


NO: 


216) 


RD 


-N42 


:CATCATGGGATCCTGTTTCCTGTGTGAAATTGTTATCCGC 


(SEQ 


ID 


NO: 


217) 
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RELLUC . SEQ ATGACTTCGAAAGTTTATGATC 



RLUCVERl . SEQ A T G 
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RLUCFINL.SEQA T G 
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A A 


G 
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.T 


G 


T 


A 


C 


G 


A 


c 


c 


c 


c 


G A 


G 


C 


A -A 


C 


G 


c 



CAGAACAAAGGAAACGG 

A a[g]c gFc 

A A A C G C 
A A A C G C 



A 40 
A 40 
A 40 
A 40 



RELLUC. SEQ TGATAACTGGTCCGCAGTGGT 



RLUCVERl . SEQ T GAT 
RLUCVER2 - SEQ T GAT 
RLUCFINL.SEQT GAT 



A c[c]g G 
A C T G G 
A C T G G 



C C 

c c 
c c 



GCCAGATG T*A AACAAAT 80 



C A G T G G T < 
CAGTGGTGGGC 
CAGTGGTGGGC 



G C C C 



T C 
T C 



G 


C 


T 


G 


C 


A A 


G 


c a[g]a T 


80 


G 


C 


T 


G 


C 


A A 


G 


C A A A T 


80 


G 


C 


T 


G 


c 


A A 


G 


C A A A T 


80 



RELLUC. SEQ G A A T G 
RLUCVERl . SEQ G A A 
RLUCVER2 . SEQ G A A 
RLUCFINL . SEQ G A A 



TTCTTGATTCATTTATTAATTATTATG A TTCA G A A 120 



c 


G 


T 


G 


C 


T 


G 


G A 


C 


T 


C 


C 


T 


T 


C 


A T 


C 


A A 


C 


T 


A 


C 


T aIcIg aIc a g c 


G 


A 


G 


120 


c 


G 


T 


G 


C 


T 


G 


G A 


C 


T 


C 


C 


T 


T 


C 


A T 


C 


A A 


C 


T 


A 


C 


TATGATTC 


C 


G 


A 


G 


120 


C 


G 


T 


G. 


C 


T 


G 


G A 


C 


T 


c 




T 


T 




A T 


C. 


A A 


£. 


T 


A 


C, 


TATGATTC 


C 


G 


A 


G. 


120 



RELLUC. SEQ . A A A C A 
RLUCVERl - SEQ A A 
RLUCVER2 . SEQ A A 



RELLUC. SEQ C G 



TGCAGAAAATG 



CTGTTATTTT 
A tIcJt T 



TTTACATGGTAACG 160 



RLUCVERl . SEQ C 
RLUCVER2 . SEQ C 



G 
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G 
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T 
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c 


A G 


C 


T 


A 


C 


C 


T 


G 


T 


G 


G 


A 



C c 



A T T T T T 
A T T T T T 



G A C A T G T 



C A 
C A 
C A 



G T 
G T 



cJg t 



g 
c 
c 



c a[c]g g|c1a a C G 160 

CATGGTAACG 160 
CATGGTAACG 160 



G T G C C 
G T G C C 
G T G C C 



A C A 


T 


A 


T 


200 


T 


C A 


C 


A 


T 


200 


T 


C A 


C 


A 


T 


200 


T 


C A 


C 


A 


T 


200 



RELLUC. SEQ 
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Figure 7 (Cont,) 
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Figure 7 (Cont) 
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Figure 8 
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Figure 9A 

Codon usage in RELLUC 

{Renilla reniformis; Genbank ACCESSION:M63501; MedUne:9 1239583) 
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1 


GGC 


Gly 


4 


GTA 


Val 


6 


GCA 


Ala 


8 


GAA 


Glu 


25 


GGA 


Gly 


3 


GTG 


Val 


3 


GCG 


Ala 


3 


GAG 


Glu 


5 


GGG 


Gly 


0 
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Figure 9B 

Codon Usage in Rluc-final 



TTT 


Phe 


4 


TCT 


Ser 


0 


TAT 


Tyr 


2 


TGT 


Cys 


1 


TTC 


Phe 


12 


TCC 


Ser 


10 


TAC 


Tyr 


11 


TGC 


Cys 


2 


TTA 


Leu 


0 


TCA 


Ser 


1 


TAA 


*** 


0 


TGA 




0 


TTG 


Leu 


0 


TCG 


Ser 


0 


TAG. 


*** 


0 


TGG 


Trp 


8 


CTT 


Leu 


3 


CCT 


Pro 


11 


CAT 


His 


2 


CGT 


Arg 


0 


CTC 


Leu 


6 


CCC 


Pro 


3 


CAC 


His 


8 


CGC 


Arg 


7 


CTA 


Leu 


0 


CCA 


Pro 


4 


CAA 


Gin 


3 


CGA 


Arg 


0 


CTG 


Leu 


13 


CCG 


Pro 


0 


CAG 


Gin 


4 


CGG 


Arg 


3 


ATT 


lie 


3 


ACT 


Thr 


1 


AAT 


Asn 


2 


AGT 


Ser 


.1 


ATC 


lie 


18 


ACC 


Thr 


4 


AAC 


Asn 


11 - 


AGO 


Ser 


7 


ATA 


lie 


0 


ACA 


Thr 


0 


T^AA 


Lys 


4 


AGA 


Arg 


2 


ATG 


Met 


9 


ACG 


Thr 


0 


AAG 


Lys 


23 


AGG 


Arg 


1 


GTT 


Val 


2 


GCT 


Ala 


11 


GAT 


Asp 


6 


GGT 


Gly 


3 


GTC 


Val 


8 


GCC 


Ala 


9 


GAC 


Asp 


11 


GGC 


Gly 


7 


GTA 


Val 


0 


GCA 


Ala 


0 


GAA 


Glu 


2 


GGA 


Gly 


3 


GTG 


Val 


13 


GCG 


Ala 


0 


GAG 


Glu 


28 


GGG 


Gly 


4 
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Figure 10 

Oligonucleotides for the assembly of synthetic Renilla luciferase gene 



Sense Strand 
Oligo name 

RLSl (1-40) 
RLS2 (41-80) 
RLS3 (81-120) 
RLS4 (121-170) 

RLS5 (171-210) 
RLS6 (21 1-250) 
RLS7(25l-290) 
RLS8 (291-330) 
RLS9 (33 1-370) 
RLSIO (371-410) 
RLSll (411-450) 
RLS12(45M95) 
RLS13 (496-535) 
RLS 14 (536-575) 
RLS 15 (576-620) 
RLS16 (621-660) 
RLS17 (661-700) 
RLS 1 8 (701-740) 
RLS19 (741-780) 
RLS20 (781-820) 
RLS21 (821-860) 
RLS22 (861-900) 
RLS23 (901-949) 

Anti-sense Strand 

Oligo name 
RLAS 1(1-29) 
RLAS2 (30-69) 
RLAS3 (70-109) 
RLAS4 (1 10-149) 
RLAS5 (150-189) 
RLAS6 (190-229) 
RLAS7 (230-269) 
RLAS8 (270-309) 
RLAS9 (310-349) 
-RLAS 10 (350-394) 
RLAS 11 (395-434) 
RLAS12 (435-474) 
RLAS13 (475-517) 
RLAS14 (518-559) 
RLAS 15 (560-599) 
RLAS16 (600-639) 
RLAS 1 7 (640-679) 
RLASl 8 (680-719) 
RLAS 19 (720-764) 
RLAS20 (765-804) 
RLAS21 (805-849) 
RLAS22 (850-889) 
RLAS23 (890-929) 
RLAS24 (930-949) 



Oligo sequence from 5' to 3' 

AACCATGGCTTCCAAGGTGTACGACCCCGAGCAACGCAAA (SEQ ID NO:246) 

CGCATGATCACTGGGCCTCAGTGGTGGGCTCGCTGCAAGC (SEQ ID NO:247) 

AAATGAACGTGCTGGACTCCTTCATCAACTACTATGATTC (SEQ ID NO:248) 

CGAGAAGCACGCCGAGAACGCCGTGATITTTCTGCATGGTAACGCTGCCT 

(SEQ ID NO:249) 

CCAGCTACCTGTGGAGGCACGTCXjTGCCTCACATCGAGCC (SEQ ID NO;250) 

CGTGGCTAGATGCATCATCCCTGATCTGATCGGAATGGGT (SEQ ID NO:25 1 ) 

AAGTCCGGCAAGAGCGGGAATGGCTCATATCGCCTCCTGG (SEQ ID NO:252) 

ATCACTACAAGTACCTCAa^GCTTGGTTCGAGCTGCTGAA (SEQ ID NO:253) 

CCTTCCAAAGAAAATCATCTTTGTGGGCCACGACTGGGGG (SEQ ID NO:254) 

GCTTGTCTGGCCTTTCACTACTCCTACGAGCACC/^^ (SEQ ID NO:255) 

AGATCAAGGCCATCGTCCATGCTGAGAGTGTCGTGGACGT (SEQ ID NO:256) 

GATCGAGTCCTGGGACX3AGTGGCCTGACATCGAGGAGGATATCGC (SEQ ID NO:257) 
CCTGATCAAGAGCGAAGAGGGCGAGAAAATGGTGCTTGAG (SEQ ID NO:25S) 

AATAACTTCTTCGTCGAGACCATGCTCCCAAGCAAGATCA (SEQ ID NO:259) 

TGCGGAAACTGGAGCCTGAGGAGTTCGCTGCCTACCTGGAGCCAT (SEQ ID n6:260) 
TC AAGGAGAAGGGCGAGGTTAGACGGCCTACCCTCTCCTG (SEQ ID NO:26 1 ) 

GCCTCGCGAGATCCCTCTCGTTAAGGGAGGCAAGCCCGAC (SEQ ID NO:262) 

GTCGTCCAGATTGTCCGCAACTACAACGCCTACCTTCGGG (SEQ ID NO:263) 

CCAGCGACGATCTGCCTAAGATGTTCATCGAGTCCGACCC (SEQ ID NO:264) 

TGGGTTCTTTTCCAACGCTATTGTCGAGGGAGCTAAGAAG (SEQ ID NO:265) 

TTCCCTAACACCGAGTTCGTGAAGGTGAAGGGCCTCCACT (SEQ ID NO:266) 

TCAGCCAGGAGGACGCTCCAG ATG AAATGGGTAAGTACAT (SEQ E) NO:267) 

CAAGAGCTnrCGTGGAGCGCGTGCTGAAGAACGAGCAGTAATTCTAGAGC 

(SEQ ID NO:268) 



Oligo Sequence from 5' to 3' 
GCTCTAGAATTACTGCraSTTCTTCAGCA 
CGCGCTa^ACGAAGCTCnTGATGTACTTACCX:ATTTC 
TGGAGCGTCCTCCTGGCTGAAGTGGAGGCCCTTCACCT^ 
ACGAACTCGGTGTTAGGGAACTTCnTAGCTCCCTCGACAA 
TAGCGTTGGAAAAGAACCCAGGGTCGGACTCGATGAACAT 
CTTAGGCAGATCGTCGCTGGCCCGAAGGTAGGCGTTGTAG 
TTGCGGACAATCTGGACXjACGTCGGGCTTGCCTCCCTTAA 
CGAGAGGGATCTCGCGAGGCCAGGAGAGGGTAGGCCGTCT 
AACCTCGCCCTTCTCCTTGAATGGCTCCAGGTAGGCAGCG 
AACTCCTCAGGCTCCAGTTTCCGCATGATCTTGCTTGGGAGCATG 
GTCTCGACGAAGAAGTTATTCTCAAGCACCATTTTCTCGC 
CCTCTTCGCTCTTGATCAGGGCGATATCCTCCrc 
AGGCCACTCGTrcCAGGACTCGATCACGTCCACGACACTCTCA 
GCATGGACGATGGCCrrGATCTTGTCTTGGTGCTCGTAGGAG 
TAGTGAAAGGCCAGACAAGCCCaXAGTCGTGGCCCACAA - 
AGATGATTirCTTTGGAAGGTTCAGCAGCTCGAACCAAGC 
GGTGAGGTACTTGTAGTGATCCAGGAGGCGATATGAGCCA 
TTCCCGCTCTTGCCGGACTTACCCATTCCGATCAGATCAG 
GGATGATGCATCTAGCCACGGGCTCGATGTGAGGCACGACGTGCC 
TCCACAGGTAGCTGGAGGCAGCGTTACCATGCAGAAAAAT 
CACGGCGTTCTCGGCGTGCTTCTCGGAATCATAGTAGTTGATGAA 
GGAGTCCAGCACGTTCATTTGCTTGCAGCGAGCCCACCAC 
TGAGGCCCAGTGATCATGCGTTTGCGTTGCTCGGGGTCGT 
ACACCTTGGAAGCCATGGTT 



(SEQIDNO:269) 

(SEQIDNO:270) 

(SEQIDNO:271) 

(SEQIDNO:272) 

(SEQ m NO:273) 

(SEQIDNO:274) 

(SEQ ID NO:275) 

(SEQIDNO:276) 

(SEQIDNO:277) 

(SEQIDNO:278) 

(SEQIDNO:279) 

(SEQK)NO:280) 

(SEQIDNO:281) 

(SEQK>NO:282) 

(SEQIDNO:283) 

(SEQIDNO:284) 

(SEQIDNO:285 

(SEQlDNO:286) 

(SEQ ID NO:287) 

(SEQ ID NO:288) 

(SEQ ID NO:289) 

(SEQ ID NO:290) 

(SEQ ID NO:291) 

(SEQ ID NO:292) 
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Figure 11 ' • 

GRVER51.SEQ A T G A T G A A|ACJ G (CjC A|a) A a1g| A aIcJg T [gJ A T [cj T A [c] G G [c) C c|r]G A A C 40 
LOCPPLVG.SEQA TGATGAAGAGAGAGAAAAATG T T A T A T A T G G A C C C G A AC 40 
~RD1561H9.SEQA T G A t(a1a A g[c] g{t}g A G" A" A A A'A T G T[e]A T|c!t A T G-G [cj C C |t1g A ISl C 4 0 

GRVER51.SEQ c\^C tUc aIt]c cQt G G A A G A c[c]t[c]a c[c]g c(t]g g[t]g A^A T G C T 80 

lucpplyg.seqc cctacaccccttggaagacttaacag c a g gag AAATGCT 80 

RD1561H9.SEQc(t1c T^C a(t]c c|i]T T G G a{g]g a[t]t t(g]a c[t]g c|c]g g[c]g AAATGCT 80 

GRVER51.SEQ C T T c[c]g[a]g cSc T |g] C g[t]a A A C A T [ag] T C a|Fc]t1c]c c(t]c a|a1g c(a) 120 
LOCPPLYG.SEQC TTCAGGGCCCTTCGAAAACATTCTCATTTAC C G C A G G C T 120 
RDlS61H9.SECiG]T t{Fc]g[t1g cSc t|c]c g[c]a aIHc a[c]t C T C A T T t[g|C c|tJc aIaIg Cjcj 120 

GRVER51.SEQ [c1t[c1g t[g]g a[c]g t|cIg]t[g]g gIaJg A C G a IG A G C| C t(c]t OCT AjcjA A A G 160 
LUCPPLYG.SEQT TAGTAGATGTGTTTGGTGACGAAT C GC T TTC C T ATA A A 6 160 
RD1561H9.SEQT T [g] G tIcJg A T G T g[g]t[c]g g[c]g a[t]g A A T c[tIt]t[gXg1c T 160 

GRVER51.SBQ A@T T T T t[c1g A A G C T A c It G T G| C t[^t[g]g c[c]c A A A g|c1c T C C A 200 
LOCPPLYG.SBQA GTTTTTTGAAGCTA C A T G C C T C C T A G C G C A A A G T C T C C.A 200 
RD1561H9,SEQA GTTTTTTG a[g1g cBa c Ic G Tl ciilTfcjc t(g]g CItJC a[g^;;£Ic]C T C C A 200 

GRVER51.SEQ [H A A T T G T G gUIt A C A A^A T G A a[c]g AT G T^G T g|a G C|A t(t]t G [t] 240 
■ LUCPPLYG.SEQC AATTGTGGATACAAGA.TGAATGATG TAG T G T C G A T C T G C 240 
RD1561H9.SEQC A A T T G T G G^T ACAAGATGA a[c]g a[c]g i\^G t|t A G T|A T C T gItJ 240 

GRVER51.SEQ G C 0 G A G A A T A A 0 A |cTT] G [c] T T (c) T T T A T T C C Ell T [a] A T |c) G C [t] G 280 
LUCPPLYG.SEQGC CGAGAATAAT A AAA G ATTTTTTATTC C CA T T A T T G C A G 280 
RD1561H9.SEQG C0G aHa a[c1a A T a Ic C CI gRFIt T |c] T tIUa TTC cQtIcJa T[cjG cIc]g 280 

GRVER51.SEQ C T T G G T a[c1a t[c]g gJcIa T G A T T 6 t[c]g c[c]c C T G t|g]a A T G A A^^ 320 
LOCPPLYG.SEQC TTGGTATATTGGTATGATTGTAGCAC C T G T T A A T G AAA G 320 
RD1561H9.SEQc|a1t G G T A T A t|c1g G T A T G A t[c]g t\^G .c[t]c cHg TIc]a aIcJg aIgJa G 320 

GRVER51.SEQ TTACATCCCAGATG a|g]c tIgJt.G T A A G G tItJa T G G G T A t|t A G Cj 360 • 
LUCPPLYG.SEQT TACATCCCAGATGAACTCTGTAAGGTCATGGGTA TAT C G 360 
RD1561H9.SEq[c]t A C A T0C c[c]g a(c]g A A C t[g]t G T A AjAjG T C A T G G G T ATIcJt cItJ 360 

GRVER51.SEQ A A A C C 0 C A A A T |c) G T [cIt T T (acJ T A C (c) A a[a1 A A C A T [c] T T [g] A A T A 400 
LOCPPLYG.SEQA AACCACAAATAGTTT T T T G T A C A A A G A AC A T T T T A A ATA 400 
RD1561H9.SBQA aIgJc C A C aIHa T^G t[c!t T Ic A C C| A c|t]a A G A a|t]a T t[c]t1g]a aIcJa 400 

GRVER51.SEQ A G G TjcjT T G G a(a}g T [c] C A g It C T C| g[t]a C T A Ajcjl T C A t|c1a A Ajcjo 440 
LOCPPLYG.SEQA G G T AT T GGAGGTACAGAGCA G A A C T A ATT T C A T A A A A AG 440 
RD1561H9.SEQa|a1g tEUt G G aSg t[c]c a[a)a G c|c]g1c]a c[c1a AjclT tIt]a TlrjA aIgcJG 440 

GRVERSl.SEQ |cJa T C A tIt]a t|t]c t{g]g ATA c\^G t{c1gAAAACA t[c]c A C G g{c1t G T 480 
LUCPPLYG.SEQG ATCAT CATACTTGATACTGTAG AAA A C A T AC A C G G T T G T 480 
RD1561H9.SEq[t]a T C A T C A T|cT]Ti]G a[c]a C T G T [g] G a[g]a a(tJa t[tJc A C GG T T 480 

GRVERSl.SEQ G a[g]a g|c]c T [c] C c[t]a A^T t[c]a T^T C T C G T T a |C A G C| G A T G g(t]a 520 
LUCPPLYG.SEQG A A AG T CTTCCCAATTTTATTTCTCGTTATT C G G A T G G A A 520 
RD1561H9.SEQG A aIFcIt |t]t{g1c cIHa A T T t(c]a t(c]t C T C GjcjT ATT cIaJg A^G g|cJa 520 

GRVERSl.SEQ ATA t[c1g"c1t]a a[t1t T C A a(g}c c[c]t T [g] C A T t{7t]g A T C c[a1g T [c] G A 560 
LOCPPLYG.SEQA TATTGCCAACTTCAAAC C T T T A C ATT AC G A T C C T G T T G A 560 
RD1561H9.SEQAlciA t|c]6 cIaJa ACT tHIa A A C c(ac]t[c]c a[c]tIt1c G a[c] C C T G T Ig]g A 560 



wo 02/16944 



45/65 



PCT/USOl/26566 



Figure 11 (Cont.) 



600 



GRVER51.SEQ GCAAGTGG c|c]g C T A T [t] T T [g] T G [c] T C [c] T c[c]g G C A c[c]a C T G q\t} 
LOCPPLYG.SEQG CAAGTGGCAGCTA T C T T A T G T T C G T C A G GCACTACTGGA 600 
"RD1561H9VSE<Q C A A G C A~ G c[c] a T {t^T^T G t |a G C A G C| G g [tJa C T A C T G G A 600 ■ 



ORVERSl.SEQ T T [g] C C [t] A A A G G T G T [c] A T G C a[g]a C T C A C C a[g]a ATA t[c]t G T G 640 

LUCPPLYG.SEQT TACCGAAAGGTGTAATGCAAACTCACCAAAATATTTGTG 640 

RD1561H9.SEq[£|t[c]c c[a]a a|g]g G [a] G T [c] A T G C a(g]a c[c]c a[t]c AAA a[c)a T T T g\q\g 640 

GRVER51.SEQ T [g] C g |t tI tIgIa T [c] C a[c]g C t(c]t[c)g A -C C c It cI gFIg It g| g G [t] A c[t]c A 680 

LUCPPLYG.SEQT CCGACTTATACATGCTTTAGACC C CA G G G C A G G A A C G CA 680 

RD1561fi9.SEQT[G]c g[t]c tIgJa t|c]c A T G C t[c]t|c1g a[t1c c Ia cI g Ic T -A cI g g[c]a C0C A 680 

GRVER51.SEQ a[t]t[g]a T [c] C C T G g[c]g T G A c[t]g T Ig cI tIgIg t[g|t ATCTGCCTT T [c| 720 

LDCPPLYG.SEQ A CTTATTCCTGGTGTGACAGTCTTAGTAT a t C T G C C T T T T 720 

RD1561H9.SEC^ Ct[g]aTT CC TG G T G t[c1a c[c1gT CTt[g1gt[c]t a |c t| t G C C T T T [c] 720 

GRVER51.SEQ T T [t] C a[c]g cjc)T T T G g\¥}t T C T C T A T [t] a[cJ C [c] T G G gI^It a[t]t T C A 760 

I.OCPPLYG.SEQT TCCATGCTTTTGGGT T C T C T ATA A AC T TGGGATACTTCA 760 

RD1561H9.SEQT TCCATGCTT t[c]g G [c] T T It C aI t A t[t1a IC t| t T G G g\t\t ACT t[tJa 760 

GRVER51.SEQ T G G T [c] G g Ic tI tIgIc G T G T [cj A T C A T G T t|t cIgFtIc g[c]t T [c] G a[c]c A 800 

LDCPPLYG.SEQ T GGTGGGTCTTCGTGTTATCATGT T A A G ACGATTTGATCA 800 

RD1561H9-SEQT G G tIcJg G T C t(c]c g[c]g t[g]a t[t]a T G T t |c c| g[c]c g[t]t T T G A T C A 800 

GRVER51.SEQ AGAAGc[c]t t |c t| t[g1a a[g]gCTATTCa[a|gA@Ta[c]ga[g]gt[g|cg[t] 840 

LUCPPLYG. SEQ AGAAGCATT T C TAAAAGCTATTCAGGATTATGAAGTTCGA 840 

RD1561H9.SEq[g]g Aj^G C [t] T t |c t| tFg1a A A G c[c)a T^C a[a)g ATTATGAAG T |c] C g[c] 840 

GRVER51.SEQ |T C c| g t[g]at[c]aAC G t[c]c c |t t| c a[g]t[c|at[t]t T G T T C [c] T [gI^ G c| a 880 

locpplyg.seq A gtgtaa'ttaacgttc c a g c a a t a a t at T GTTCTTATCGA 880 

RD1561H9.SEQA G T G t[c]a t[c]a A C G T [g] C c It A G C gI tIgIa t Ic c| T G T t[t]t tIg]t'cIt)a 880 

GRVER51.SSQ A a It c| T CCTTTGGTTGACA a[g]t a{t]g A T [cl T [g~A G c| a G [c] T t |g c] G 920 

LOCPPLYG.SEQA a a G T C C T T T GGTTGACAAATACGATTTAT C a a G T T T a A G 920 

RD1561H9.SEQa(g]a g[cJc C |A C| t|c1g t[g|g a C a a(g|t a C G a[c]t t{gJt c It T C a"c] t |G cI g 920 

GRVER51.SEQ [t] G a |g c| t G T g|c]t g{t]g g[c]g c[t]g c[t]c cQt t[g|g c(c]a AAGAAGt[g] 960 

LUCPPLYG. SEQ GGAATTGTGTTGCGGTGCGGCACCATTAGCAAAAGAAGTT 960 

RD1561H9.SEq[t]g AATTGTGTTGCGGTG c\^G c[t]c C a[cIt[g1g c(t]a a[g]g a[g]g t[c] 960 

GRVER51.SBQ G c[c]g AGGTiclGclrlG lc tI a a(g1c G |t cI tIgIa A c(c]t[c|c c[t1g g[t]a T [cj C 1000 

LUCPPLYG. SEQG CTGAGGTTGCA G TA A AACGATTAA A C T T GCCAGGAATTC 1000 

RD1561H9.SEQG C T G a\^G t[g]g c[c1g IC c| a A A C gIcJt t[g|a a It C| t[t1c C A G g|g|a T T C 1000 

GRVER51.SEQ G C T g[c]g g(t]t TTGGTTTGA C^G a IgA G c| a C T T c(t]g C T A h\c\A T 1040 

LUCPPLYG. SEQ G CTGTGGATTTG G T T T GACAGAATCTA C T T C A G C T A A T A T 1040 

RD1561H9.SBQG[TjT G T G g[c]t t(c]g g |c c| t[c]a C0G A A T C T A c |C A G tI g c{g]aIt]t A T 1040 

GRVER51.SEQ j^C a[t]a g |c t| t |g cI gIaIg a[c]g a[g]t T T Aa[g1t c\t]g G |T A G c| C t[g]g g[t1 1080 

LUCPPLYG . SEQ ACACAGTCTTGGGGATGA'ATTTAA A T C A G G A T C A C T T G G A 1080 

RD1561H9 . SEc(c| C a[g]a[c]t Ct[c|gGGGATG a\g\t T T A A |g A G c1 G g[c]t c |t tI tIgIg g[c1 1080- 

GRVER51.SEQ [c]g(c]g t|g]a C T C C t[c]t(t]a T G G c(t]g c[a]a a[g]a T [c] G c[c]g A|c c|gIt]g 1120 

LUCPPLYG. SEQA GAGTTACTC C T T T AATGGCAGCTAAAATAGCAGATAGGG 1120 

RD1561H9.SEq[c1g{tJg t[c]a C T C C |A Cl TfclA T G G c[t]g C T A a[g]a tIcJg C0G A t|c1g[c1g 1120 
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Figure 1 1 (Cont.) 

GRVER51.SEQ a|g]a C [c] G G [c] A A A G C aIcJt G G G [cJ C CAAATCAAG T [cj G G T G A A T T 1160 
LUCPPLYG.SEQA AACTGGTAAAGCATTGGGACCAAATCAAGTTG G T G A AT T 1160 
RD1561H9.SEQA AACTGGTA a[g]g C [t] T T G G Gfcjc c[g]a a|c|c A A G t\g\g G^G k\GC}T 1160 

GRVER51.SEQ [g]t g[Fa]t T A a|g)g G0C C [t] A T G G t|c1t c[t|a A A~G g"[c] T A C~G f G A"A C 1200 
LUCPPLYG.SEQ A T G CG T TAAAGGTCCCATGG T A T C G A AAGGTTACGTGAAC 1200 
RD1561H9.SEq{g1t g[Fa1t|c1a A A G G^C C0A T G G T |G A G C| a a[g]g G T T a(t]g t|c1a a\t\ 1200 

GRVER51.SEQ A A T G T [g] G a[g]g c(c]a c[t]a A A G A A G c[c]a TTGATGATGATG g[c]t 1240 
LUCPPLYG.SEQA ATGTAGAAGCTACCAAAGAAGCTATTGATGATG a T G G T T 1240 
RD1561H9.SEQA A^G T [t] G AAGCTACCA a[g]G'AIg]g c[c1a T G a[c1g aIc]g a[c]g G\c\t 1240 

GRVER51.SEQ' G G C T [c] C a |t A G cj G g[c]g ACT T [c] G g[t]t ACTATGATGAGG a[c)g A 1280 
LUCPPLYG.SEQG GCTTCACTCTGGAGACTTTGGATACTATGATGAGGATGA 1280 
RD1561H9.SEQG g[t]t|g]c aQt C T G g[t1g a\t}t TTGGATa|t]t a[c]g a\c\g a[a)g A T G A 1280 

GRVER51.SEQ [a] C a[c]t TCTATGTGG t[c]g aIt]c G (cJ T A C A aSg AATTGATTA A [g] 1320 
LUCPPLYG.SEQG CATTTCTATGTGGTGGACCGTTACAAGG A AT T G A T T A A A 1320 
RD1561H9,SEQG C A T T t(t]t a[1]g t(c]g T G G a(t]c GTTACAAGG a |G cI t G a t[c1a a a 1320 

GRVER51.SEQ T a[c]a aIaJg G C T C T C aIaJg T [c] G C A C C [a] G c[c]g A A C T [g] G A A G A [a] A 1360 
LUCPPLYG.SEQT a T a a G G G C T C T C AGGTAGCACCTGCAG a AC T AGAAGAGA 1360 
RD1561H9.SEQT a\c\a A G G G It A G C| C A G G t\t}g c[t1c c(a]g c[t]g a |g tI tIgIg a[g]g A G A 1360 

GRVER51.SEQ T T T T [gTI T G A a[g]a a[c]c C |t] T G T A T C [c] G [c] G a[c]g T [g] G C [c] G T jc] G T 1400 
LUCPPLYG.SEQT TTTATTGAAAAATCCATGTA T C A G AGATGTTGCTGTGGT 1400 
RD1561H9.SEQT t[c]t[g]t TGAAAAATCCAT G [c] A t |T c| g[c1g A T G t{c]g C T G T G G T 1400 

GRVER51.SEQ [g] G G T A T [c] C c{^G a |c t| t[g]g A A G C T G G [c] G a |g t| t G C c |t A G c| g C |c] 1440 
LUCPPLYG.SEQT GGTATTCCTGATCTAGAAGCTGGAGAACTGCCATCTGCG 1440 
RD1561H9.SEcic]G g\c\a TTCCT.GATCt|g]g A^G c[c1gg[c]gAACTGCc[t]tCTGc[t] 1440 

GRVER51.SEQ T T T G T G G T [g] A A A C a[a)c C C G g[c]a A G G A G A 'v\c\a C [t] G C T A A [g] G 1480 
LUCPPLYG.SEQT TTGTGGTTAAACAGCCCGGA A AG G AGATTACAGCTAAAG 1480 
RD1561H9.SEQT t[c]g t{t]g t[c1a a[g]c A G C c|t]g gItIa Ic a| G a[a]a T T A C [c] G c1c)a A A G 1480 

GRVER51.SEQ aI^G T [c] T A C G a\c\t A t(t]t[g]g C C G Ag[c]g[c]g t[g]t c[t]c a[c]a c[c]a A 1520 
LOCPPLYG.SEQA AGTGTACGATTATCTTGCCG A G A G G G T C T C C CATACAAA 1520 
RD1561H9.SEQA A G T G T aItJg ATT A^C t[g]g c[t]g a |a c| g[t|g t |g A g| C C A T A c{t]a A 1520 

GRVER51.SEQ [a] T A T [c] T G C G T G G [c] G G H] G T [c] C G [c] T T C G T [H G A t |t C t| a T [t] C C A 1560 
LUCPPLYG.SEQG TATTTGCGTGGAGGGGTTCGATTCGTT G A TAG C A T A C C A 1560 
RD1561H9.SEQG T a[c]t T G C G T G G [c] G g\c\g T [g] C g[t]t t[t]g T T G a [C T c| C A t[c]c c|t] 1560 

GRVER51.SEQ [c1g[c]a a\c\g T T A c|c]g G T A a(g]a t[c]a c |t c| g[t]a a{a]g a |g t| t|g]c T G A 1600 
LUCPPLYG.SEQA GGAATGTTACAGGTAAAATTA C A A G A A A G G A A C T TC T G A 1600 
RD1561H9.SEcic]GlT]A a|c1g t[a)a C A G g[c]a.A A A T T A c|c c|g[c1a A G G a(g]c t[gIt]t G A 1600 

GRVER51.SEQ A G C a |A 'c| t|c1c T [c] G a[a]a a |a G c| t |G G C G G c| 

lucpplyg.seqa gcagttgctggagaa g agttctaaa c t t 
rdi561H9.seqa(a1c a[a1t t g[t]t g g^g a a g Igccggcggt] 
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Figure 12 
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GRverS.I DNA sequence of pGL3 vectors 

ATGGTGAAACGCGAAAAGAACGTGATCTACGGCCCAGAACCACTGCATCC 5 0 
ACTGG AAGACCTC ACCGCTGGTGAGATGCTCTTCCGAGCACTGCGTAAAC 100 
ATAGTCACCTCCCTCAAGCACTCGTGGACGTCGTGGGAGACGAGAgCCTC 150 
TCCTACAAAGAATTTTTCGAAGCTACTGTGCTGTTGGCCCAAAGCCTCCA 2 00 
TAATTGTGGGTACAAAATGAACGATGTGGTGAGCATTTGTGCTGAGAATA 250 
ACACTCGCTTCTTTATTCCTGTAATCGCTGCTTGGTACATCGGGATGATT 300 
GTCGCCCCTGTGAATGAATCTTACATCCCAGATGAGCTGTGTAAGGTTAT 350 
GGGTATTAGCAAACCTCAAATCGTCTTTACTACCAAAAACATCTTGAATA 4 00 
AGGTCTTGGAAGTCCAGTCTCGTACTAACTTCATCAAACGCATCATTATT 4 50 
CTGGATACCGTCGAAAACATCCACGGCTGTGAGAGCCTCCCTAACTTCAT 5 0 0 
CTCTCGTT ACAGCGATGGTAATATCGCTAATTTCAAGCCCTTGCATTTTG 550 
ATCCAGTCGAGCAAGTGGCCGCTATTTTGTGCTCCTCCGGCACCACTGGT 600 
TTGCCTAAAGGTGTCATGCAGACTCACCAGAATATCTGTGTGCGTTTGAT 650 
CCACGCTCTCGACCCTCGTGTGGGTACTCAATTGATCcCTGGCGTGACTG 70 0 
TGCTGGTGTATCTGCCTTTCTTTCACGCCTTTGGTTTCTCTATTACCCTG 7 50 
GGCTATTTCATGGTCGGCTTGCGTGTCATCATGTTTCGTCGCTTCGACCA 8 0 (5 
AGAAGCCTTCTTGAAGGCTATTCAAGACTACGAGGTGCGTTCCGTGATCA 850 
ACGTCCCTTCAGTCATTTTGTTCCTGAGCAAATCTCCTTTGGTTGACAAG 900 
TATGATCTGAGCAGCTTGCGTGAGCTGTGCTGTGGCGCTGCTCCTTTGGC 950 
CAAAGAAGTGGCCGAGGTCGCTGCTAAGCGTCTGAACCTCCCTGGTATCC 1000 
GCTGCGGTTTTGGTTTGACTGAGAGCACTTCTGCTAACATCCATAGCTTG 1050 
CGAG ACGAGTTTAAGTCTGGTAGCCTGGGTCGCGTGACTCCTCTTATGGC 1100 
TGCAAAGATCGCCGACCGTGAGACCGGCIAAAGCACTGGGCCCAAATCAAG 1150 
TCGGTGT^TTGTGTATTAAGGGCCCTATGGTCTCTATUVGGCTACGTGAAC 1200 
AATGTGGAGGCCACTAAAGAAGCCATTGATGATGATGGCTGGCTCCATAG 12 5 0 
CGGCGACTTCGGTTACTATGATGAGGACGAACACTTCTATGTGGTCGATC 13 0 0 
GCTACAAAGAATTGATTAAGTACAAAGGCTCTCAAGTCGC ACCAGCCGAA 13 5 0 
CTGGAAGAAATTTTGCTGAAGAACCCTTGTATCCGCGACGTGGCCGTCGT 14 0 0 
GGGTATCCCAGACTTGGAAGCTGGCGAGTTGCCTAGCGCCTTTGTGGTGA 1450 
AACAACCCGGCAAGGAGATC ACTGCTAAGG AGGTCTACGACT ATTTGGCC 1500 
GAGCGCGTGTCTCACACCAAATATCTGCGTGGCGGCGTCCGCTTCGTCGA 1 55 0 
TTCTATTCCACGCT^CGTTACCGGTAAGATCACTCGTAAAGAGTTGCTGA 1600 
AGCAACTCCTCGAAAAAGCTGGCGGC 1626 
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RDverS.I DNA sequence of pGL3 vectors 

ATGGTGAAGCGTGAGAT^AAATGTCATCTATGGCCCTGAGCCTCTCCATCC 5 0 
TTTGGAGGATTTGACTGCCGGCGAAATGCTGTTTCGTGCTCTCCGCAAGC 100 
ACTCTc ATTTGCCTCAAGCCTTGGTCGATGTGGTCGGCGATGAATCTTTG 150 
AGCTACAAGGAGTTTTTTGAGGCAACCGTCTTGCTGGCTCAGTCCCTCCA 200 
CAATTGTGGCTACAAGATGAACGACGTCGTTAGTATCTGTGCTGAAAACA 250 
ATACCCGTTTCTTCATTCCAGTCATCGCCGCATGGTATATCGGTATGATC 3 00 
GTGGCTCCAGTCAACGAGAGCTACATTCCCGACGAACTGTGTAAAGTCAT 35 0 
GGGTATCTCTAAGCCACAGATTGTCTTCACCACTAAGAATATTCTGAACA 400 
AAGTCCTGGAAGTCCAAAGCCGCACCAACTTTATTAAGCGTATCATCATC 450 
TTGGACACTGTGGAGAATATTCACGGTTGCGAATCTTTGCCTAATTTCAT 500 
CTCTCGCTATTCAGACGGCAACATCGCAAACTTTAAACCACTCCACTTCG 550 
ACCCTGTGGAACAAGTTGCAGCCATTCTGTGTAGCAGCGGTACTACTGGA 600 
CTCCCAAAGGGAGTCATGCAGACCCATCAAAACATTTGCGTGCGTCTGAT 650 
CCATGCTCTCGATCCACGCTACGGCACTCAGCTGATTCCTGGTGTCACCG 700 
TCTTGGTCTACTTGCCTTTCTTCCATGCTTTCGGCTTTCATATTACTTTG 750 
GGTTACTTTATGGTCGGTCTCCGCGTGATTATGTTCCGCCGTTTTGATCA 800 
GGAGGCTTTCTTGAAAGCCATCCAAGATTATGAAGTCCGCAGTGTCATCA 85 0 
ACGTGCCTAGCGTGATCCTGTTTTTGTCTAAGAGCCCACTCGTGGACAAG 900 
TACGACTTGTCTTCACTGCGTGAATTGTGTTGCGGTGCCGCTCCACTGGC 95 0 
TAAGGAGGTCGCTGAAGTGGCCGCCAAACGCTTGAATCTTCCAGGGATTC 1000 
■ GTTGTGGCTTCGGCCTCACCGAATCTACCAGCGCTATTATTCAGTCTCTC 1050 
CGCGATGAGTTTAAGAGCGGCTCTTTGGGCCGTGTCACTCCACTCATGGC 110 0 
TGCTAAGATCGCTGATCGCGAAACTGGTAAGGCTTTGGGCCCGAACCAAG 1150 
TGGGCGAGCTGTGTATCAAAGGCCCTATGGTGAGCAAGGGTTATGTCAAT 12 00 
AACGTTGAAGCTACCAAGGAGGCCATCGACGACGACGGCTGGTTGCATTC 12 50 
TGGTGATTTTGGATATTACG ACGAAGATGAGCATTTTTACGTCGTGGATC 13 00 
GTTACAAGGAGCTGATCAAATACAAGGGTAGCCAGGTTGCTCCAGCTGAG 1350 
TTGGAGGAGATTCTGTTGAAAAATCCATGCATTCGCGATGTCGCTGTGGT 1400 
CGGCATTCCTGATCTGGAGGCCGGCGAACTGCCTTCTGCTTTCGTTGTCA 1450 
AGCAGCCTGGTAAAGAAATTACCGCCAAAGAAGTGTATGATTACCTGGCT 1500 
GAACGTGTGAGCCATACTAAGTACTTGCGTGGCGGCGTGCGTTTTGTTGA 1550 
CTCCATCCCTCGTAACGTAACAGGCAAAATTACCCGCAAGGAGCTGTTGA 1600 
AACAATTGTTGGAGAAGGCCGGCGGT 1626 
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RD1561H9 DNA seq uence of pGL3 vectors 

ATGGTAAAGCGTGAGAAAAATGTCATCTATGGCCCTGAGCCTCTCCATCC 5 0 
TTTGGAGGATTTGACTGCCGGCGAAATGCTGTTTCGTGCTCTCCGCAAGC 100 
ACTCTCATTTGCCTCAAGCCTTGGTCGATGTGGTCGGCGATGAATCTTTG 150 
AGCTACAAGGAGTTTTTTGAGGCAACCGTCTTGCTGGCTCAGTCCCTCCA 200 
CAATTGTGGCTACAAGATGAACG ACGTCGTTAGTATCTGTGCTGAAAACA 250 
ATACCCGTTTCTTCATTCCAGTCATCGCCGCATGGTATATCGGTATGATC 3 00 
GTGGCTCCAGTCAACGAGAGCTACT^TTCCCGACGAACTGTGTAAAGTCAT 35 0 
GGGTATCTCTAAGCCACAGATTGTCTTCACCACTAAGAATATTCTGAACA 4 00 
AAGTCCTGGAAGTCCAAAGCCGCACCAACTTTATTAAGCGTATCATCATC 450 
TTGGACACTGTGGAGAATATTCACGGTTGCGAATCTTTGCCTAATTTCAT 5 00 
CTCTCGCTATTCAGACGGCAACATCGCAAACTTTAAACCACTCCACTTCG 550 
ACCCTGTGGAAGAAGTTGCAGCCATTCTGTGTAGCAGCGGTACTACTGGA 600 
CTCCCAAAGGGAGTCATGCAGACCCATCAAAACATTTGCGTGCGTCTGAT 650 
CCATGCTCTCGATCCACGCTACGGCACTCAGCTGATTCCTGGTGTCACCG 700 
TCTTGGTCTACTTGCCTTTCTTCCATGCTTTCGGCTTTCATATTACTTTG 750 
GGTTACTTTATGGTCGGTCTCCGCGTGATTATGTTCCGCCGTTTTGATCA 800 
GGAGGCTTTCTTGAAAGCCATCCAAGATTATGAAGTCCGCAGTGTCATCA 850 
ACGTGCCTAGCGTGATCCTGTTTTTGTCTAAGAGCCCACTCGTGGACAAG 900 
TACGACTTGTCTTCACTGCGTGAATTGTGTTGCGGTGCCGCTCCACTGGC 950 
TAAGGAGGTCGCTGAAGTGGCCGCCAAACGCTTGAATCTTCCAGGGATTC 1000 
GTTGTGGCTTCGGCCTC ACCGAATCTACCAGTGCGATTATCCAGACTCTC 105 0 
GGGGATGAGTTTAAGAGCGGCTCTTTGGGCCGTGTCACTCCACTCATGGC 1100 
TGCTAAGATCGCTGATCGCGAAACTGGTAAGGCTTTGGGCCCGAACCAAC3 115 0 
TGGGCGAGCTGTGTA.TCAAAGGCCCTATGGTGAGCAAGGGTTATGTCAAT 1200 
AACGTTGAAGCTACCAAGGAGGCCATCGACGACGACGGCTGGTTGCATTG 125 0 
TGGTGATTTTGGATATTACGACGAAGATGAGCATTTTTACGTCGTGGATC 13 00 
GTTACAAGGAGCTGATCAAATACAAGGGTAGCCAGGTTGCTCCAGCTGAG 1350 
TTGGAGGAGATTCTGTTGAAAAATCC ATGCATTCGCGATGTCGCTGTGGT 1400 
CGGCATTCCTG ATCTGGAGGCCGGCGAACTGCCTTCTGCTTTCGTTGTCA 1450 
AGCAGCCTGGTACAG AAATTACCGCCAAAGAAGTGTATGATTACCTGGCT 150 0 
GAACGTGTGAGCCATACTAAGTACTTGCGTGGCGGCGTGCGTTTTGTTGA 1550 
CTCCATCCCTCGTAACGTAACAGGCAAAATTACCCGCAAGGAGCTGTTGA 1600 
AACAATTGTTGGTGAAGGCCGGCGGT 1626 
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MVKREKWI YGPEPIiHPLEDLTAGEMLFRALRKHSHLPQAIiVDWGDESL 5 0 
S YKEFFEATVLLAQSLHWCGYKMroVVS I CAENNTRFFI PVI AAWYIGMI 100 
VAPWESYIPDELCKVMGISKPQIVFTTKNILNKVLEVQSRTNFIKRIII 15 0 
LDTVENIHGCESLPNFISRYSDGNIANFKPLHFDPVEQVAAILCSSGTTG 2 00 

IiPKGVMQTHQNICVRIilHAIiDPRVGTQLIPGVTVLVYIiPFFHAFGFSITL 25 0 C rT/?) TT) (k/// - 0 
GYFMVGLRVIMFRRFDQEAFLKAIQDYEVRSVINVPSVILFLSKSPLVDK 3 00 X)''-^ V V^V/ 
YDLSSLRELCCGAAPLAKEVAEVAAKRLNLPGIRCGFGLTESTSANIHSIi 350 
RDEFPCSGSLGRVTPIiMAAKXADRETGKALGPNQVGELCIKGPMVSKGYVN 400 
NVEATKEAIDDDGWIjHSGDFGYYDEDEHFYWDRYKELIKYKGSQVAPAE 45 0 
IjEEIIjLKNPCIRDVAWGIPDIiEAGELPSAFVVKQPGKEITAKEVYDYLA 500 
ERVSHTKYLRGGVRFVDS IPRNVTGKITRKEIiLKQIiliEKAGG 542 



RDverS.I protein sequence of pGL3 vectors 

MVKREKNVIYGPEPLHPLEDLTAGEMLFRAIiRKHSHLPQALVDWGDESL 5 0 
SYKEFFEATVLIJ^QSLHNCGYKMiroWSICAENNTRFFIP^^ 100 
VAPVNESYIPDELCKVMGISKPQIVFTTK3sriLNKVLEVQSRTNFIKRIII 150 
liDTVENIHGCESLPNFISRYSDGNIANFKPLHFDPVEQVAAIIiCSSGTTG 2 00 

LPKGVMQTHQNICVRIilHAIiDPRYGTQLIPGVTVLVYLPFFHAFGFHITIi 25 0 --r--rN A j/l - a 

GYFMVGLRVIMFRRFDQEAFLKAIQDYEVRSVINVPSVILFLSKSPLVDK 3 00 c^(JJi-W WU " 
YDLSSIiRELCCGAAPIiAKEVAEVAAKRIiNLPGIRCGFGLiTESTSAI IQSL 350 
RDEFKSGSIiGRVTPIiMAAKIADRETGKALGPNQVGELCIKGPMVSKGYVN 400 
mrEATKIIAIDDDGWL.HSGDFGYYr)EDEHFYVVDRYKELIKYKGSQVAPAE 45 0 
LEEILLKNPCIRJDVAVVGIPDLEAGErjPSAFWKQPGKEITAKEVYEiYX.A 50 0 
ERVSHTKYLRGGVRFVDSIPRJSrVTGKITRKEIiIiKQIiLEKAGG 542 



RD1 561 H9 protein sequence of pGL3 vectors 

MVK3lEKNVIYGPEPIiHPLEDIiTAGEMLFRAIjRKHSHLPQALVDWG 5 0 

SYKEFFEATVLIiAQSIjHNCGYKMNDWSICAEmTRFFXPVIAAWY 100 
VAPVNESYIPDEIiCKVMGISKPQIVFTTKNILNKATLEVQSRTNFIKRIlI 15 0 

IiDTVENIHGCESLPNFISRYSDGNIAJSrFKPIiHFDPVEQVAAILCSSGTTG 2 00 h I • ( - CI 'I 

liPKGVMQTHQNICVRIilHALDPRYGTQLIPGVTVIiVYIiPFFHAFGFHITL 250 ^Xjd " t>0o 

GYFMVGIiRVIMFRRFDQEAFLKAIQDYEVRSVINVPSVILFLSKSPIiVDK 3 0 0 
YDIiSSLRELCCGAAPIiAKEVAEVAAKRLNLPGIRCGFGIiTESTSAI IQTL 3 5 0 
GDEFKSGSLGRVTPLMAAKIADRETGKAIiGPNQVGELClKGPMVSKG YVN 4 00 
NVEATKEAIDDDGWIiHSGDFGYYDEDEHFYVVDRYKEIjIKYKGSQVAPAE 450 
LEEILLKNPCIRDVAWGIPDLEAGEr,PSAFWKQPGTEITAKEVYDYLA 500 
ERVSHTKYLRGGVRFVDSIPRNVTGKITRKELIiKQLLVKAGG 542 
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SEQUENCE IiISTING 

<110> Promega Corporation 

.5_ _Wood,_.Keith .V, . _ . 

Gruber, Monika G. 
Zhuang, Yao 
Paguio, Aileen 

10<120> Synthetic nucleic acid molecule compositions and methods of 
preparation 

<130> 341.005WO1 

15<150> US 09/645,706 
<151> 2000-08-24 

<160> 302 

20<170> FastSEQ for Windov/s Version 4.0 

<210> 1 
<211> 1629 
<212> DNA 
25<213> Pyrophorus plagiophthalamus 

<400> 1 



atgatgaaga 


gagagaaaaa 


tgttatatat 


ggacccgaac 


ccctacaccc 


cttggaagac 


60 


ttaacagcag 


gagaaatgct 


cttcagggcc 


cttcgaaaac 


attctcattt 


accgcaggct 


120 


30ttagtagatg 


tgtttggtga 


cgaatcgctt 


tcctataaag 


agttttttga 


agctacatgc 


180 


ctcctagcgc 


aaagtctcca 


caattgtgga 


tacaagatga 


atgatgtagt 


gtcgatctgc 


240 


gccgagaata 


ataaaagatt 


ttttattccc 


attattgcag 


cttggtatat 


tggtatgatt 


300 


gtagcacctg 


ttaatgaaag 


ttacatccca 


gatgaactct 


gtaaggtcat 


gggtatatcg 


360 


aaaccacaaa 


tagttttttg 


tacaaagaac 


attttaaata 


aggtattgga 


ggtacagagc 


420 


3 5agaactaatt 


tcataaaaag 


gatcatcata 


cttgatactg 


tagaaaacat 


acacggttgt 


480 


gaaagtcttc 


ccaattttat 


ttctcgttat 


tcggatggaa 


atattgccaa 


cttcaaacct 


540 


ttacattacg 


atcctgttga 


gcaagtggca gctatcttat 


gttcgtcagg 


cactactgga 


600 


ttaccgaaag 


gtgtaatgca 


aactcaccaa 


aatatttgtg 


tccgacttat 


acatgcttta 


660 


gaccccaggg 


caggaacgca 


acttattcct 


ggtgtgacag 


tcttagtata 


tctgcctttt 


720 


40ttccatgctt 


ttgggttctc 


tataaacttg 


ggatacttca 


tggtgggtct 


tcgtgttatc 


780 


atgttaagac 


gatttgatca 


agaagcattt 


ctaaaagcta 


ttcaggatta 


tgaagttcga 


840 
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agtgtaatta acgttccagc aataatattg ttcttatcga aaagtccttt ggttgacaaa 900 

tacgatttat caagtttaag ggaattgtgt tgcggtgcgg caccattagc aaaagaagtt 960 

gctgaggttg cagtaaaacg attaaacttg ccaggaattc gctgtggatt tggtttgaca 1020 

gaatctactt cagctaatat acacagtctt ggggatgaat ttaaatcagg atcacttgga 1080 

Bagagttactc ctttaatggc agctaaaata gcagataggg aaactggtaa agcattggga 1140 

ccaaatcaag ttggtgaatt atgcgttaaa ggtcccatgg tatcgaaagg ttacgtgaac 1200 

aatgtagaag ctaccaaaga agctattgat gatgatggtt ggcttcactc tggagacttt 1260 

ggatactatg atgaggatga gcatttctat gtggtggacc gttacaagga attgattaaa 1320 

tataagggct ctcaggtagc acctgcagaa ctagaagaga ttttattgaa aaatccatgt 13 8 0 

lOatcagagatg ttgctgtggt tggtattcct gatctagaag ctggagaact gccatctgcg 1440 

tttgtggtta aacagcccgg aaaggagatt acagctaaag aagtgtacga ttatcttgcc 1500 

gagagggtct cccatacaaa gtatttgcgt ggaggggttc gattcgttga tagcatacca 1560 

aggaatgtta caggtaaaat tacaagaaag gaacttctga agcagttgct ggagaagagt . 1620 

tctaaactt ^^^^ 

15 

<210> 2 
<211> 1626 
<212> DWA 

<213> Artificial Sequence 

20 

<220> 

<223> Sequence of clone YG#81-6G01 
<400> 2 

25atgatgaagc gagagaaaaa tgttatatat ggacccgaac ccctacaccc cttggaagac 60 

ttaacagctg gagaaatgct cttccgtgcc cttcgaaaac attctcattt accgcaggct 120 

ttagtagatg tggttggcga cgaatcgctt tcctataaag agttttttga agcgacagtc 18 

ctcctagcgc aaagtctcca caattgtgga tacaagatga atgatgtagt gtcgatctgc 24 

gccgagaata atacaagatt ttttattccc gttattgcag cttggtatat tggtatgatt 30 

3 0gtagcacctg ttaatgaaag ttacatccca gatgaactct gtaaggtgat gggtatatcg 36 

aaaccacaaa tagtttttac gacaaagaac attttaaata aggtattgga ggtacagagc 42 

agaactaatt tcataaaaag gatcatcata cttgatactg tagaaaacat acacggttgt 4 80 

gaaagtcttc ccaattttat ttctcgttat tcggatggaa atattgccaa cttcaaacct 540 

ttacatttcg atcctgttga gcaagtggca gctatcttat gttcgtcagg cactactgga 600 

35ttaccgaaag gtgtaatgca aactcaccaa aatatttgtg tccgacttat acatgcttta 66 

gaccccaggg caggaacgca acttattcct ggtgtgacag tcttagtata tctgcctttt 72 

ttccatgctt ttgggttctc tataaccttg ggatacttca tggtgggtct tcgtgttatc 7 8 

atgttcagac gatttgatca agaagcattt ctaaaagcta ttcaggatta tgaagttcga 84 

agtgtaatta acgttccatc agtaatattg ttcttatcga aaagtccttt ggttgacaaa 90 

40tacgatttat caagtttaag ggaattgtgt tgcggtgcgg caccattagc aaaagaagtt 96 
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. gctgaggttg cagcaaaacg attaaacttg ccaggaattc gctgtggatt tggtttgaca 1020 

gaatctactt cagctaatat acacagtctt agggatgaat ttaaatcagg- atcacttgga 1080 

agagttactc ctttaatggc agctaaaata gcagataggg aaactggtaa agcattggga 1140 

ccaaatcaag- ttggtgaat-t-atgeat-taaa-ggtcccatgg-tatcgaaagg--ttacg.tgaac 12,0p_ 

Saatgtagaag ctaccaaaga agctattgat gatgatggtt ggcttcactc tggagacttt 1260 

ggatactatg atgaggatga gcatttctat gtggtggacc gttacaagga attgattaaa 1320 

tataagggct ctcaggtagc acctgcagaa ctagaagaga ttttattgaa aaatccatgt 1380 

atcagagatg ttgctgtggt tggtattcct gatctagaag ctggagaact gccatctgcg 1440 

tttgtggtta aacagcccgg aaaggagatt acagctaaag aagtgtacga ttatcttgcc 1500 

lOgagagggtct cccatacaaa: gtatttgcgt ggaggggttc gattcgttga tagcatacca 1560 

aggaatgtta caggtaaaat tacaagaaag gaacttctga agcagttgct ggagaaggcg 1620 

ggaggt ^^^^ 

<210> 3 
15<211> 1626 
<212> DNA. 

<213> Artificial Sequence 

<220> 

20<223> Sequence of a synthetic luciferase 
<400> 3 

atgatgaaac gcgaaaagaa cgtcatctac ggcccagagc ctctgcaccc attggaagac 60 

ctgaccgccg gtgagatgtt gttccgtgct ctgcgtaaac attctcactt gcctcaagcc 120 

25ctggtggatg tcgtgggcga cgaaagcttg tcttataagg agtttttcga agctactgtc 180 

ctgttggccc agtctctgca taattgcggt tacaaaatga acgatgtggt cagcatttgt 240 

gctgagaata acacccgctt tttcatccca gtgattgccg cttggtacat cggcatgatt 300 

gtcgcccctg tgaatgaatc ttatatccca gacgagttgt gcaaggtcat gggtattagc 3 60 

aaacctcaaa tcgtgtttac taccaagaac attctgaata aagtcttgga agtgcagtct 420 

30cgtactaact tcatcaagcg cattatcatt ctggataccg tcgagaatat ccacggctgt 480 

gaaagcttgc caaactttat ttctcgttat agcgacggta atatcgctaa cttcaagcct 540 

ctgcattttg atccagtgga gcaagtcgcc gctattttgt gctctagcgg cactaccggt 600 

ctgcctaaag gcgtgatgca gactcaccaa aatatctgtg tccgcttgat tcatgccctg 660 

gacccacgtg tgggtaccca gttgatccct ggcgtgactg tcctggtgta cttgccattc 720 

35bttcacgcct tcggtttttc tattaccctg ggctatttca tggtcggttt gcgcgtgatc 7 80 

atgtttcgtc gcttcgatca agaagctttt ctgaaggcca ttcaggacta cgaggtccgt 840 

agcgtgatca acgtcccttc tgtgattttg ttcctgagca aatctccatt ggtcgataag 900 

tatgacctga gctctttgcg cgaactgtgc tgtggcgctg cccctttggc taaagaggtg 960 

gccgaagtcg ctgccaagcg tctgaatttg ccaggtatcc gctgcggctt tggtctgact 1020 

4 0gagagcacct ctgctaacat tcatagcttg cgtgatgaat tcaaatctgg cagcctgggt 10 80 
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4 

cgcgtgactc ctttgatggc cgctaagatc gccgaccgtg agaccggcaa agctctgggt 114 0 

ccaaatcaag tcggcgaatt gtgtattaag ggtcctatgg tgtctaaagg ctacgtcaac 12 00 

aatgtggagg ccactaagga agctatcgat gacgatggtt ggctgcacag cggcgacttt 1260 

ggttattacg atgaggacga acatttctat gtcgtggatc gctacaaaga gttgattaag 1320 

Stataaaggct ctcaggtcgc cccagctgag ctggaagaga tcttgctgaa gaacccttgc 1380 

attcgtgacg tggccgtcgt gggtatccca gatttggaag ctggcgagct gcctagcgcc 1440 

tttgtcgtga aacaaccagg taaggaaatt accgctaaag aggtctacga ctatttggcc 1500 

gaacgcgtgt ctcacactaa. gtacctgcgt ggcggtgtcc gcttcgtgga tagcatccct 1560 

cgcaatgtca ccggcaaaat tactcgtaag gagttgctga aacagttgct ggaaaaggct 1620 

lOggtggc 1626 



<210> 4 
<211> 1626 
<212> DNA 
15<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic luciferase 



20<400> 4 



atgatgaaac 


gcgaaaagaa 


cgtcatctac 


ggcccagagc 


ctctgcaccG 


attggaagac 


60 


ctgaccgctg 


gtgagatgtt 


gttccgtgct 


ctgcgtaaac 


attctcactt 


gcctcaagcc 


120 


ctggtcgatg 


tcgtgggcga 


cgagagcttg 


tcttataagg 


aatttttcga 


agctactgtc 


180 


ctgttggccc 


aatctctgca 


taattgcggt 


tacaaaatga 


acgatgtggt 


cagcatttgt 


240 


25gctgagaata 


acacccgctt 


tttcatccca 


gtgattgccg 


cttggtacat 


cggcatgatt 


300 


gtcgcccctg 


tgaatgaatc 


ttatatccca 


gacgagttgt 


gcaaggtcat 


gggtattagc 


360 


aaacctcaaa 


tcgtgtttac 


taccaagaac 


attctgaata 


aggtcttgga 


agtgcagtct 


420 


cgtactaact 


tcatcaagcg 


cattatcatt 


ctggataccg 


tcgagaatat 


ccacggctgt 


480 


gagagcttgc 


caaactttat 


ttctcgttat 


agcgacggta 


atatcgctaa 


cttcaagcct 


540 


3 0ctgcattttg 


atccagtgga 


gcaagtcgcc 


gctattttgt 


gctctagcgg 


caccaccggt 


600 


ctgcctaaag 


9C9tgatgca 


gactcaccaa 


aatatctgtg 


tccgcttgat 


tcatgccctg 


660 


gacccacgtg 


tgggtactca 


gttgatccct 


ggcgtgactg 


tcctggtgta 


cttgccattc 


720 


tttcacgcct 


tcggtttttc 


tattaccctg 


ggctatttca 


tggtcggttt 


gcgcgtgatc 


780 


atgtttcgtc 


gcttcgatca 


agaagccttt 


ctgaaggcca 


ttcaagacta 


cgaggtccgt 


840 


35agcgtgatca 


acgtcccttc 


tgtgattttg 


ttcctgagca 


aatctccatt 


ggtcgataag 


900 


tatgacctga 


gcagcttgcg 


cgaactgtgc 


tgtggcgctg 


cccctttggc 


taaagaggtg 


960 


gccgaagtcg 


ctgccaagcg 


tctgaatttg 


ccaggtatcc 


gctgcggctt 


tggtctgact 


1020 


gagagcacct 


ctgctaacat 


tcatagcttg 


cgtgatgagt 


tcaaatctgg 


cagcctgggt 


1080 


cgcgtgactc 


ctttgatggc 


cgctaagatc 


gccgaccgtg 


agaccggcaa 


agctctgggt 


1140 


4 Occaaatcaag 


tcggcgaatt 


gtgtattaag 


ggtcctatgg 


tgtctaaagg 


ctacgtcaac 


1200 
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5 

aatgtggagg ccactaagga agctattgat gacgatggtt ggctgcacag cggcgacttt 1260 

ggttattacg atgaggacga acatttctat gtcgtcgatc gctacaaaga gttgattaag 1320 

tataaaggct ctcaagtcgc cccagctgag ctggaagaaa tcttgctgaa gaacccttgc 13 80 

" " a't'tcgtgac^ trggccgtcgt" gggtatccca ctggcgagct g"cctagcgcc "1440 

Stttgtcgtga aacaaccagg caaggaaatt accgctaaag aggtctacga ctatttggcc 1500 

gagcgcgtgt ctcacactaa gtacctgcgt ggcggtgtcc gcttcgtcga tagcatccct 1560 

cgcaatgtca ccggcaaaat tactcgtaag gagttgctga aacagttgct ggaaaaggct 1620 

ggtggc ; 1626 



10<210> 5 

<211> 1626 

<212> DNA 

<213> Artificial Sequence 
15<220> 

<223> Sequence of a synthetic lucif erase 



<400> 5 



atgatgaaac 


gcgaaaagaa 


cgtgatctac 


ggcccagaac 


cactgcatcc 


actggaagac 


60 


20ctcaccgctg 


gtgagatgct 


gttccgtgcc 


ctgcgtaaac 


atagccacct 


gcctcaagct 


120 


ctcgtggacg 


tcgtgggtga 


cgagagcctg 


tcttacaaag 


aatttttcga 


agctactgtg 


180 


ctgttggccc 


aaagcctgca 


taattgtggt 


tacaaaatga 


acgatgtggt 


gagcatctgt 


240 


gctgagaata 


acactcgctt 


ttttatccct 


gtgatcgctg 


cttggtacat 


cggcatgatt 


300 


gtcgcccctg 


tgaatgaatc 


ttacatccca 


gatgagttgt 


gtaaggtgat 


gggtattagc 


360 


25aaacctcaaa 


tcgtctttac 


taccaaaaac 


atcctgaata 


aggtcttgga 


agtccagtct 


420 


cgtactaatt 


tcatcaaacg 


cattattatt 


ctggataccg 


tcgaaaacat 


ccacggctgt 


480 


gagagcttgc 


ctaactttat 


ctctcgttac 


agcgatggta 


atatcgctaa 


tttcaagcca 


540 


ctgcattttg 


atccagtcga 


gcaggtcgcc 


gccattttgt 


gctcttctgg 


caccactggt 


600 


ttgcctaaag 


gtgtcatgca 


gactcaccag 


aatatctgtg 


tgcgcttgat 


ccacgccctc 


660 


30gaccctcgtg 


tgggtactca 


attgatccct 


ggcgtgactg 


tgctggtgta 


tttgcctttc 


720 


tttcacgcct 


ttggtttttc 


tatcaccctg 


ggctatttca 


tggtcggctt 


gcgtgtgatc 


780 


atgtttcgtc 


gcttcgacca 


agaagccttc 


ctgaaggcta 


ttcaagacta 


cgaggtgcgt 


840 


tctgtgatca 


atgtcccatc 


tgtcattttg 


ttcctgagca 


aatctccttt 


ggttgacaag 


900 


tatgatctga 


gcagcttgcg 


tgaactgtgc 


tgtggcgctg 


ctcctttggc 


caaagaagtg 


960 


35gccgaggtcg 


ctgctaagcg 


tctgaacctc 


cctggtatcc 


gctgcggttt 


tggtttgact 


1020 


gagagcactt 


ctgccaacat 


ccatagcttg 


cgtgacgagt 


ttaaatctgg 


tagcctgggt 


1080 


cgcgtgaccc 


ctttgatggc 


tgcaaagatc 


gccgaccgtg 


agaccggcaa 


agccctgggc 


1140 


ccaaatcagg 


tcggtgaatt 


gtgcattaag 


ggccctatgg 


tctctaaagg 


ctacgtgaac 


1200 


aatgtggagg 


ccactaaaga 


agctattgat 


gatgatggtt 


ggttgcatag 


cggcgacttc 


1260 


40ggttattatg 


atgaggacga 


acacttctat 


gtggtcgatc 


gctataaaga 


attgattaag 


1320 
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tacaaaggct ctcaagtcgc cccagctgaa ctggaagaaa ttttgctgaa gaacccttgt 13 8 0 

attcgcgacg " tggccgtcgt gggtatccca gacttggaag ctggcgagtt gcctagcgcc 1440 
tttgtggtga aacaacctgg caaggagatt actgctaagg aggtctacga ctatttggcc 

gagcgcgtgt ctcacactaa atatctgcgt ggcggcgtcc gcttcgtcga ttctatccct 1560 

Bcgcaacgtca ccggcaagat cactcgtaaa gagttgctga aacaattgct cgaaaaagct 1620 

ggcggc ^^26 



<210> 6 
<211> 1626 
10<212> VNA 

<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic lucif erase 

15 

<400> 6 



atgatgaaac 


gcgaaaagaa 


cgtgatctac 


ggcccagaac 


cactgcatcc 


actggaagac 


60 


ctcaccgctg 


gtgagatgct 


cttccgtgca 


ctgcgtaaac 


atagtcacct 


ccctcaagct 


12 0 


ctcgtggacg 


tcgtgggaga 


cgagagcctc 


tcttacaaag 


aatttttcga 


agctactgtg 


180 


20ctgttggccc 


aaagcctcca 


taattgtgga 


tacaaaatga 


acgatgtggt 


gagcatttgt 


240 


gctgagaata 


acactcgctt 


ctttatccct 


gttatcgctg 


cttggtacat 


cggcatgatt 


300 


gtcgcccctg 


tgaatgaatc 


ttacatccca 


gatgagctgt 


gtaaggttat gggtattagc 


360 


aaacctcaaa 


tcgtctttac 


taccaaaaat 


atcctgaata 


aggtcttgga 


agtccagtct 


420 


cgtactaact 


tcatcaaacg 


catcattatt 


ctggataccg 


tcgaaaacat 


ccatggctgt 


480 


25gagagcctgc 


ctaacttcat 


ctctcgttac 


agcgatggta 


atatcgctaa 


tttcaaacca 


540 


ctgcattttg 


atccagtcga 


gcaagtggcc 


gctattttgt 


gctcttccgg 


caccactggt 


600 


ttgcctaaag 


gtgtcatgca 


gactcaccag 


aatatctgtg 


tgcgtttgat 


ccacgctctc 


660 


gaccctcgtg 


tgggtactca 


attgatccct 


ggcgtgactg 


tgctggtgta 


tctgcctttc 


720 


tttcacgcct 


ttggtttttc 


tattaccctg 


ggctatttca" 


tggtcggctt 


gcgtgtcatc 


780 


SOatgtttcgtc 


gcttcgacca 


agaagccttc 


ttgaaggcta 


ttcaagacta 


cgaggtgcgt 


840 


tctgtcatca 


atgtcccttc 


agtcattttg 


ttcctgagca 


aatctccttt 


ggttgacaag 


900 


tatgatctga 


gcagcttgcg 


tgagctgtgc 


tgtggcgctg 


ctcctttggc 


caaagaagtg 


960 


gccgaggtcg 


ctgctaagcg 


tctgaacctc 


cctggtatcc gctgcggttt tggtttgact 


1020 


gagagcactt 


ctgctaacat 


ccatagcttg 


cgagacgagt 


ttaagtctgg 


tagcctgggt 


1080 


35cgcgtgactc 


ctcttatggc 


tgcaaagatc 


gccgaccgtg 


agaccggcaa 


agcactgggc 


114 0 


ccaaatcaag 


tcggtgaatt 


gtgtattaag 


ggccctatgg 


tctctaaagg 


ctacgtgaac 


120 0 


aatgtggagg 


ccactaaaga 


agccattgat 


gatgatggct 


ggctccatag 


cggcgacttc 


1260 


ggttactatg 


atgaggacga 


acacttctat 


gtggtcgatc 


gctacaaaga 


attgattaag 


1320 


tacaaaggct 


ctcaagtcgc 


cccagccgaa 


ctggaagaaa 


ttttgctgaa 


gaacccttgt 


1380 


40atccgcgacg 


tggccgtcgt 


gggtatccca 


gacttggaag 


ctggtgagtt gcctagcgcc 


1440 
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tttgtggtga aacaacctgg aaaggagatc actgctaagg aggtctacga ctatttggcc 1500 
gagcgcgtgt ctcacaccaa atatctgcgt ggcggcgtcc gcttcgtcga ttccatccca 1560 
cgcaacgtga ccggtaagat cactcgtaaa gaattgctga agcaactcct cgaaaaagct 1620 

- ggcggc — - 162.6 

5 

<210> 7 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 

10 

<220> 

<22 3> Sequence of a synthetic lucif erase 



<400> 7 

ISatgatgaaac gcgaaaagaa cgtgatctac 
ctcaccgctg gtgagatgct cttccgagca 
ctcgtggacg tcgtgggaga cgagagcctc 
ctgttggccc aaagcctcca taattgtggg 
gctgagaata acactcgctt ctttattcct 

2 0gtcgcccctg tgaatgaatc ttacatccca 

aaacctcaaa tcgtctttac taccaaaaac 
cgtactaact tcatcaaacg catc'attatt 
gagagcctcc ctaacttcat ctctcgttac 
ttgcattttg atccagtcga gcaagtggcc 
25ttgcctaaag gtgtcatgca gactcaccag 
gaccctcgtg tgggtactca attgatccct 
tttcacgcct ttggtttctc tattaccctg 
atgtttcgtc gcttcgacca agaagccttc 
tccgtgatca acgtcccttc agtcattttg 

3 0tatgatctga gcagcttgcg tgagctgtgc 

gccgaggtcg ctgctaagcg tctgaacctc 
gagagcactt ctgctaacat ccatagcttg 
cgcgtgactc ctcttatggc tgcaaagatc 
ccaaatcaag tcggtgaatt gtgtattaag 
35aatgtggagg ccactaaaga agccattgat 
ggttactatg atgaggacga acacttctat 
tacaaaggct ctcaagtcgc accagccgaa 
atccgcgacg tggccgtcgt .gggtatccca 
tttgtggtga aacaacccgg caaggagatc 

4 0gagcgcgtgt ctcacaccaa atatctgcgt 



ggcccagaac cactgcatcc actggaagac 60 

ctgcgtaaac atagtcacct ccctcaagca 12 0 

tcctacaaag aatttttcga agctactgtg 180 

tacaaaatga acgatgtggt gagcatttgt 240 

gtaatcgctg cttggtacat cggcatgatt 3 00 

gatgagctgt gtaaggttat gggtattagc 360 

atcttgaata aggtcttgga agtccagtct 420 

ctggataccg tcgaaaacat ccacggctgt 480 

agcgatggta atatcgctaa tttcaagccc 540 

gctattttgt gctcctccgg caccactggt 600 

aatatctgtg tgcgtttgat ccacgctctc 660 

ggcgtgactg tgctggtgta tctgcctttc 720 

ggctatttca tggtcggctt gcgtgtcatc 780 
ttgaaggcta ttcaagacta cgaggtgcgt ^ 840 

ttcctgagca*:aatctccttt ggttgacaag 900 

tgtggcgctg ctcctttggc caaagaagtg 960 

cctggtatcc gctgcggttt tggtttgact 1020 
cgagacgagt ttaagtctgg tagcctgggt " 1080 

gccgaccgtg agaccggcaa agcactgggc 1140 

ggccctatgg tctctaaagg ctacgtgaac 12 00 

gatgatggct ggctccatag cggcgacttc 1260 

gtggtcgatc gctacaaaga attgattaag 1320 

ctggaagaaa ttttgctgaa gaacccttgt 13 80 

gacttggaag ctggcgagtt gcctagcgcc 1440 

actgctaagg aggtctacga ctatttggcc 1500 

ggcggcgtcc gcttcgtcga ttctattcca 1560 
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cgcaacgtta ccggtaagat cactcgtaaa gagttgctga agcaactcct cgaaaaagct 
ggcggc 

<210> 8 ~ 
5<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> Sequence of a synthetic luciferase 



<400> 8 



atgatgaaac 


gcgaaaagaa 


cgtgatctac 


ggcccagaac 


cactgcatcc 


actggaagac 


en 
b U 


ctcaccgctg 


gtgagatgct 


cttccgagca 


ctgcgtaaac 


atagtcacct 


ccctcaagca 


Ton 


ISctcgtggacg 


tcgtgggaga 


cgagaacctc 


tcctacaaag 


aatttttcga 


agctactgtg 


JLo U 


ctgttggccc 


aaagcctcca 


taattgtggg 


tacaaaatga 


acgatgtggt 


gagcatttgt 




gctgagaata 


acactcgctt 


ctttattcct 


gtaatcgctg 


cttggtacat 


cggcatgatt 


o n n 
o U U 


gtcgcccctg 


tgaatgaatc 


ttacatccca 


gatgagctgt 


gtaaggttat 


gggtattagc 


3 50 


aaacctcaaa 


tcgtctttac 


taccaaaaac 


atcttgaata 


aggtcttgga 


agtccagtct 


420 


2 Ocgtactaact 


tcatcaaacg 


catcattatt 


ctggataccg 


tcgaaaacat 


ccacggctgt 


4 80 


gagagcctcc 


ctaacttcat 


ctctcgttac 


agcgatggta 


atatcgctaa 


tttcaagccc 


540 


ttgcattttg 


atccagtcga 


gcaagtggcc 


gctattttgt 


gctcctccgg 


caccactggt 


600 


ttgcctaaag 


gtgtcatgca 


gactcaccag 


aatatctgtg 


tgcgtttgat 


ccacgctctc 


660 


gaccctcgtg 


tgggtactca 


attgatctct 


ggcgtgactg 


tgctggtgta 


tctgcctttc 


720 


25tttcacgcct 


ttggtttctc 


tattaccctg 


ggctatttca 


tggtcggctt 


gcgtgtcatc 


780 


atgtttcgtc 


gcttcgacca 


agaagccttc 


ttgaaggcta 


ttcaagacta 


cgaggtgcgt 


840 


tccgtgatca 


acgtcccttc 


agtcattttg 


ttcctgagca 


aatctccttt 


ggttgacaag 


900 


tatgatctga 


gcagcttgcg 


tgagctgtgc 


tgtggcgctg 


ctcctttggc 


caaagaagtg 


960 


gccgaggtcg 


ctgctaagcg 


tctgaacctc 


cctggtatcc 


gctgcggttt 


tggtttgact 


1020 


3 0gagagcactt 


ctgctaacat 


ccatagcttg 


cgagacgagt 


ttaagtctgg 


tagcctgggt 


1080 


cgcgtgactc 


ctcttatggc 


tgcaaagatc 


gccgaccgtg 


agaccggcaa 


agcactgggc 


1140 


ccaaatcaag 


tcggtgaatt 


gtgtattaag 


ggccctatgg 


tctctaaagg 


ctacgtgaac 


1200 


aatgtggagg 


ccactaaaga 


agccattgat 


gatgatggct 


ggctccatag 


cggcgacttc 


1260 


ggttactatg 


atgaggacga 


acacttctat 


gtggtcgatc gctacaaaga attgattaag 


1320 


3 5 tacaaaggct 


ctcaagtcgc 


accagccgaa 


ctggaagaaa 


ttttgctgaa 


gaacccttgt 


1380 


atccgcgacg 


tggccgtcgt 


gggtatccca 


gacttggaag 


ctggcgagtt gcctagcgcc 


1440 


tttgtggtga 


aacaacccgg 


caaggagatc 


actgctaagg 


aggtctacga 


ctatttggcc 


1500 


gagcgcgtgt 


ctcacaccaa 


atatctgcgt 


ggcggcgtcc 


gcttcgtcga 


ttctattcca 


1560 


cgcaacgtta 


ccggtaagat 


cactcgtaaa 


gagttgctga 


agcaactcct 


cgaaaaagct 


1620 


40ggcggc 












1626 
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<210> 9 
<211> 1626 
<212> DNA 

<213'> Artificial Sequence" 

5 

<220> 

<223> Sequence of a synthetic lucif erase 



<400> 9 



lOatgatgaaac 


gcgaaaagaa 


cgtgatctac 


ggcccagaac 


cactgcatcc 


actggaagac 


60 


ctcaccgctg 


gtgagatgct 


cttccgagca 


ctgcgtaaac 


atagtcacct 


ccctcaagca 


120 


ctcgtggacg 


tcgtgggaga 


cgagagcctc 


tcctacaaag 


aatttttcga 


agctactgtg 


180 


ctgttggccc 


aaagcctcca 


taattgtggg 


tacaaaatga 


acgatgtggt 


gagcatttgt 


240 


gctgagaata 


acactcgctt 


ctttattcct 


gtaatcgctg 


cttggtacat 


cggcatgatt 


300 


ISgtcgcccctg tgaatgaatc 


ttacatccca gatgagctgt gtaaggttat gggtattagc 


360 


aaacctcaaa 


tcgtctttac 


taccaaaaac 


atcttgaata 


aggtcttgga 


agtccagtct 


420 


cgtactaact 


tcatcaaacg 


catcattatt 


ctggataccg 


tcgaaaacat 


ccacggctgt 


480 


gagagcctcc 


ctaacttcat 


ctctcgttac 


agcgatggta 


atatcgctaa 


tttcaagccc 


540 


ttgcattttg 


atccagtcga 


gcaagtggcc 


gctattttgt 


gctcctccgg 


caccactggt 


600 


20ttgcctaaag 


gtgtcatgca 


gactcaccag 


aatatctgtg 


tgcgtttgat 


ccacgctctc 


660 


gaccctcgtg 


tgggtactca 


attgatccct 


ggcgtgactg 


tgctggtgta 


tctgcctttc 


720 


tttcacgcct 


ttggtttctc 


tattaccctg 


ggctatttca 


tggtcggctt 


gcgtgtcatc 


780 


atgtttcgtc 


gcttcgacca 


agaagccttc 


ttgaaggcta 


ttcaagacta 


cgaggtgcgt 


840 


tccgtgatca 


acgtcccttc 


agtcattttg 


ttcctgagca 


aatctccttt 


ggttgacaag 


900 


2 5tatgatctga 


gcagcttgcg 


tgagctgtgc 


tgtggcgctg 


ctcctttggc 


caaagaagtg 


960 


gccgaggtcg 


ctgctaagcg 


tctgaacctc 


cctggtatcc 


gctgcggttt 


tggtttgact 


1020 


gagagcactt 


ctgctaacat 


ccatagcttg 


cgagacgagt 


ttaagtctgg 


tagcctgggt 


1080 


cgcgtgactc 


ctcttatggc 


tgcaaagatc 


gccgaccgtg 


agaccggcaa 


agcactgggc ' 


1140 


ccaaatcaag 


tcggtgaatt 


gtgtattaag 


ggccctatgg 


tctctaaagg 


ctacgtgaac 


1200 


3 0aatgtggagg 


ccactaaaga 


agccattgat 


gatgatggct 


ggctccatag 


cggcgacttc 


1260 


ggttactatg 


atgaggacga 


acacttctat 


gtggtcgatc 


gctacaaaga 


attgattaag 


1320 


tacaaaggct 


ctcaagtcgc 


accagccgaa 


ctggaagaaa 


ttttgctgaa 


gaacccttgt 


1380 


atccgcgacg tggccgtcgt 


gggtatccca 


gacttggaag 


ctggcgagtt 


gcctagcgcc 


1440 


tttgtggtga 


aacaacccgg 


caaggagatc 


actgctaagg 


aggtctacga 


ctatttggcc 


1500 


35gagcgcgtgt 


ctcacaccaa 


atatctgcgt 


ggcggcgtcc 


gcttcgtcga 


ttctattcca • 


1560 


cgcaacgtta 


ccggtaagat 


cactcgtaaa 


gagttgctga 


agcaactcct 


cgaaaaagct 


1620 


ggcggc 












1626 



<210> 10 
40<211> 1626 
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<212> DNA 

<213> Artificial Sequence 

~<220>~' 

5<223> Sequence of a synthetic luciferase 
<400> 10 



atgatgaagc 


gtgagaaaaa 


tgtgatttat ggtcctgaac 


cattgcatcc 


tctggaggat 


60 


ttgactgctg gcgaaatgct 


gtttcgcgcc 


ttgcgcaagc 


acagccatct 


gccacaggct 


120 


lOttggtcgacg 


tggtcggtga 


tgagtctctg agctacaaag 


aattctttga 


ggccaccgtg 


180 


ttgctggctc 


aaagcttgca 


caactgtggc 


tataagatga 


atgacgtcgt 


gtctatctgc 


240 


gccgaaaaca 


atactcgttt 


ctttattcct 


gtcatcgctg 


cctggtatat 


tggtatgatc 


300 


gtggctccag 


tcaacgagag 


ctacattcct 


gatgaactgt 


gtaaagtgat 


gggcatctct 


.360 


aagccacaga 


ttgtcttcac 


cactaaaaat 


atcttgaaca 


aggtgctgga 


ggtccaaagc 


420 


IScgcaccaatt 


ttattaaacg 


tatcattatc 


ttggacactg 


tggaaaacat 


tcatggttgc 


480 


gagtctctgc 


ctaatttcat 


cagccgctac 


tctgatggca 


acattgccaa 


ttttaaacca 


540 


ttgcacttcg 


accctgtcga 


acaggtggct gccatcctgt gtagctctgg taccactggc 


600 


ttgccaaagg 


gtgtcatgca 


aacccatcag 


aacatttgcg 


tgcgtctgat 


ccacgctctc 


660 


gatcctcgct 


acggcactca 


actgattcca 


ggtgtcaccg 


tgttggtcta 


tctgcctttt 


720 


20ttccatgctt 


ttggcttcca 


catcactttg 


ggttacttta 


tggtgggcct 


gcgtgtcatt 


780 


atgttccgcc 


gttttgacca 


ggaggccttc 


ttgaaagcta 


tccaagatta 


tgaagtgcgc 


840 


tctgtcatta 


atgtgccaag 


cgtcatcctg 


tttttgtcta 


agagccctct 


ggtggacaaa 


900 


tacgatttgt 


ctagcctgcg 


tgagttgtgt 


tgcggtgccg 


ctccactggc 


caaggaagtc 


960 


gctgaggtgg 


ccgctaaacg 


cttgaacctg 


cctggcattc 


gttgtggttt 


cggcttgacc 


1020 


25gaatctacta 


gcgccattat 


ccaatctctg 


cgcgacgagt 


ttaagagcgg 


ttctttgggc 


1080 


cgtgtcaccc 


cactgatggc 


tgccaaaatt 


gctgatcgcg 


aaactggtaa 


ggccttgggc 


1140 


cctaaccagg 


tgggtgagct 


gtgcatcaaa ggcccaatgg 


tcagcaaggg 


ttatgtgaat 


1200 


aacgtcgaag 


ctaccaaaga 


ggccattgac 


gatgacggct 


ggttgcattc 


tggtgatttc 


1260 


ggctactatg 


acgaagatga 


gcacttttac 


gtggtcgacc 


gttataagga 


actgatcaaa 


1320 


3 0tacaagggta 


gccaagtggc 


tcctgccgaa 


ttggaggaaa 


ttctgttgaa 


aaatccatgt 


1380 


atccgcgatg 


tcgctgtggt 


cggcattcct 


gacctggagg 


ccggtgaatt 


gccatctgct 


144 0 


ttcgtggtca 


agcagcctgg 


caaagagatc 


actgccaagg 


aagtgtatga 


ttacctggct 


1500 


gagcgtgtca gccataccaa 


atatttgcgc 


ggtggcgtgc 


gttttgtcga 


ctctattcca 


1560 


cgtaacgtga 


ctggtaagat 


cacccgcaaa gaactgttga 


agcaactgtt 


ggagaaagcc 


1620 


35ggcggt 












1626 



<210> 11 
<211> 1626 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> Sequence of a synthetic luciferase 



<r4. 0 0 -s* 11 

\J \J ^ U. J. 














«J Cl L> ^ Cl u> ^ Ctd^ w 


y Ay AAAAA 


tcrtaatttat 

^y L«y A w v> b« A L> 


aatcctaaac 


cattgcatcc 


tctggaggat 


60 


t.i*cfactcfCGa 

la* k»V^ClW \^ 


y ^y AAA L«y ^ 


crtfctcaccrcc 


ttcTcacaaQC 


acagccatct 


gccacaagct 


. 120 


t" t" a cr cr a a f Cf 


4- rTrrt" ocrcf 1" era 
^yy ^^9^ ^yci 


"hcraafcctetcr 

wy ci> A 1^ » w \^ ^y 


aactacaaacr 


agttctttga 


ggcaaccgtg 


180 




a Cf a cf t* t* erf a 
Ay Ay k>> ^y v** a 


V* AAv^ ^y *»y y *^ 


tataagatga 


atgacgtcgt 


gtctatctgc 


240 




a ^ a f 1" r«cit tt 


etttattcct 


gt cat cgctg 


cctggtatat 


tggtatgatc 


300 




\mi> \^ cLci y ^y y 


ctacattcct 


gatgaactgt 


gtaaagtgat 


gggcatctct 


360 


O "iV^ . V-p CI w Ca CT. 


t"1*crtcttnac 

W w ^ \^ \^ ^ ^ ^ 


cactaaaaat 


atcttgaaca 


aagtgctgga 


ggtccaaagc 


420 


Vp<^^ \mr CL w CLCl ^ 


't~+*at*t*aaar*cr 

L« L^A U'AAAt^y 


t" at cat tat c 


ttggacactg 


tggaaaacat 


tcatggttgc 


480 


y ACL W w L» W Up y v«. 


r*t" a a t* t* t* cat 


caaccactac 


tctgatggca 


acattgccaa ttttaaacca 


540 


L»l»y^MW u w>^y 


A^^(^ ^y ^^y ^ 


a c acr cf t acf c t 

AwAyy i»iyy w4» 


gccatcctgt 


guagcucuyy 


+" sa 0 1 a #^ t rtn r* 
K^ctK^ L>A(^uyy u> 


600 




crfcfi" ea tcrca 
y i»y L* w A ^y ^ A 


aanccatGacr 


a.acatttcrccr 


tgcgtctgat 


ccacgctctc 


660 




A wy y ^ A^w^ A 


actcrat tcet 


cTcrtcrtcacccr 

y y k>y i^wa^t^v^v^ 


tgttggtcta 


tctgcctttt 


720 




L« L«y y ^ i» A 


catoaG tttCT 

CL w ^ w wy 


acrttacttta 


tggtgggcct 


gcgtgtcatt 


780 




y L> L> L» u>y A^V« A 


crcra era c 1 1 1 G 
y y ciy y ^ Ui i*^^ 


ttcraaa.aG ta 


tccaagatta 


tgaagtgcgc 


840 


t~ /~« ^ 2a +" 'H 25 
L.^ Uy UOaUL-A 


A L.y L>y A Ay 


^ y u« A L> ^ w u-y 


tttttatcta 


agagccctct 


ggtggacaaa 


900 






t cracf 1 1~ crt crt 
^y «y L- uy L-y L- 


t cr c era t cr c c cr 

u,y \_ry y wy <^ v^y 


ctccactggc 


caaggaagtc 


960 




w i.^y 0 w AA A^a^y 


L» L. U.yAAV.^Vii« 


Gctaacattc 


gttgtggttt 


cggcttgacc 


1020 


y A A L> \a> i» A ^ L« A 


cf Gar*cia.ttat 


ccaa.tctctcr 


cgcgacgaat 


ttaagagcgg 


ttctttgggc 


1080 


r'crt* crt" 03^00 


cac tcrafccfcrc 


tgccaaaatt 


QCtcratCQCcr 

yv.'»*yci.^v.-y^y 


aaactggtaa 


ggccttgggc 


1140 


cctaaccagg 


tgggtgagct 


gtgcatcaaa 


ggcccaacgg 


tcagcaaggg 


ttatgtgaat 


T 5 on 


2 Baacgtcgaag 


ctaccaaaga 


ggccatcgac 


gatgacggct 


ggttgcattc 


tggtgatttc 


1260 


ggctactatg 


acgaagatga 


gcacttttac 


gtggtggacc 


gttataagga 


actgatcaaa 


1320 


tacaagggta 


gccaagtggc 


tcctgccgaa 


ttggaggaga 


ttctgttgaa 


aaatccatgt 


1380 


atccgcgatg 


tcgctgtggt 


cggcattcct 


gacctggagg 


ccggtgaatt 


gccatctgct 


1440 


ttcgtggtca 


agcagcctgg 


taaagagatc 


actgccaagg 


aagtgtatga 


ttacctggct 


1500 


SOgaacgtgtca 


gccataccaa 


atatttgcgc 


ggtggcgtgc 


gttttgtgga 


ctctattcca 


1560 


cgtaacgtga 


ctggtaagat 


cacccgcaaa 


gaactgttga 


agcaactgtt 


ggagaaagcc 


1620 


ggcggt 












1626 



<210> 12 
35<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

40<223> Secjuence of a synthetic luciferase 



wo 02/16944 
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12 

<400> 12 

atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctttgcaccc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgcgct ttgcgtaagc actctcattt gcctcaagcc 120 

ttggtcga^tg" t^g tcgg cga'Tgaa t'c 1 1 1 g"^g c tataagg" agttttttg aT^gc a"ac cgt c 18 0~ ~ 

Sttgctggctc agtctttgca taattgcggc tacaagatga acgacgtcgt ctctatttgt 24 0 . 

gccgaaaaca atacccgttt cttcattcca gtcatcgccg cctggtatat cggtatgatc 3 00 

gtggctccag tcaacgagag ctacattcct gacgaactgt gtaaagtcat gggtatctct 360 

aagccacaga ttgtgttcac cactaagaat attttgaaca aagtgctgga agtccaaagc 420 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcatggttgc 480 

lOgaatctctgc ctaatttcat. tagccgctat tctgacggca acatcgccaa ctttaaacct 540 

ttgcatttcg accctgtgga acaagtggct gctatcctgt gtagcagcgg tactactggc 600 

ctcccaaagg gcgtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cctgcctttc 720 

ttccatgctt tcggcttcca cattactttg ggttacttta tggtcggtct gcgtgtcatt 780 

ISatgttccgcc gttttgatca ggaggctttt ttgaaagcca tccaagatta tgaagtccgc 84 0 

agcgtcatta acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 900 
tacgacttgt cttccctgcg tgagttgtgt tgcggtgccg ccccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctg ccaggcattc gttgtggctt cggcctcacc 1020 

gaatctacca gcgctattat tcaatctctc cgcgatgagt ttaagagcgg ctctttgggc 108 0 

2 0cgtgtcactc cactcatggc tgctaaaatc gctgatcgcg aaactggtaa ggctttgggc 1140 

cctaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 1200 

aacgtcgaag ctaccaagga ggccatcgac gacgacggct ggctgcattc tggtgatttt 1260 

ggctactacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 132 0 

tacaagggta gccaggtggc tccagccgag ttggaggaga ttctgttgaa aaatccatgc 1380 

25atccgtgatg tcgctgtggt cggcattcct gatctggagg ccggtgaact gccttctgct 1440 

ttcgtcgtca agcagcctgg taaagaaatc accgccaaag aagtgtatga ttacctggct 1500 

gaacgtgtga gccataccaa gtacttgcgt ggcggcgtgc gttttgtgga cagcattcca 1560 

cgtaatgtga ctggtaaaat tacccgcaag gaactgttga agcaattgtt ggagaaggcc 162 0 

ggcggt 1626 

30 

<210> 13 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 

35 

<220> 

<223> Sequence of a synthetic luciferase 



<400> 13 

40atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctttgcatcc tttggaggat 
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ttgactgccg gcgaaatgct gtttcgtgct ttgcgtaaac actctcattt gcctcaagcc 120 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

ttgctggctc agtccttgca taattgtggc tacaagatga acgacgtcgt ctccatttgt 24 0 

gcagaaaaca" at~a"cccgttt ~ cttcattcca" gtcatcgccg catggtatat cggtatgatc- 300 

Bgtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 360 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 42 0 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 48 0 

gaatctttgc ctaattttat tagccgctat tcagacggaa acatcgccaa ctttaagcct 540 

ctccatttcg accctgtgga acaagttgct gcaatcctgt gtagcagcgg tactactgga 600 

lOctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 720 

ttccatgctt tcggcttcca tattactttg ggttacttta tggtcggtct gcgtgtgatt 780 

atgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 840 

agtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 900 

IStacgacttgt cttcactgcg tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctg cccggcattc gttgtggctt cggcctcacc 102 0 

gaatctacca gcgctattat tcagtctctc cgcgatgagt ttaagagcgg ctctttgggc 10 80 

cgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 1140 

cctaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 1200 

2 0aacgtcgaag ctaccaagga ggctatcgac gacgacggct ggttgcattc tggtgatttt 12 60 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 13 2 0 

tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 13 80 
attcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct . 144 0 

ttcgttgtca agcagcctgg taaagaaatt accgccaaag aagtgtatga ttacctggct 1500 

25gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgtgga tagcattcct 1560 

cgcaatgtga ctggcaaaat tacccgcaag gagctgttga aacaattgtt ggagaaggcc 162 0 

ggcggt i62 6 



<210> 14 
30<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> Sequence of a synthetic lucif erase 
<400> 14 

atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 
ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 120 
40ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 18 
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ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 24 0 

gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 3 00 

gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 3 SO 

aagccacaga "t tg tc'tftcac" ~c ac t aagaat~~a t tc tgaa^ aT^ag t c'c t gg^^ "agt"c"ca:a"ag'c 420— 

Scgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 4 80 

gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 540 

ctccacttcg accctgtgga acaagttgca gccattctgt gtagcagcgg tactactgga 600 

ctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 72 0 

lOttccatgctt tcggctttca tattactttg ggttacttta tggtcggtct ccgcgtgatt 780 

atgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 840 

agtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 900 

tacgacttgt cttcactgcg tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctt ccagggattc gttgtggctt cggcctcacc 102 0 

ISgaatctacca gcgctattat tcag^tctctc cgcgatgagt ttaagagcgg ctctttgggc 10 80 

cgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 1140 

cctaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 12 0 0 

aacgtcgaag ctaccaagga ggccatcgac gacgacggct ggttgcattc tggtgatttt 1260 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 132 0 

20tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 13 80 

attcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct 1440 

ttcgttgtca agcagcctgg taaagaaatt accgccaaag aagtgtatga ttacctggct 150 0 

gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgttga ctccatccct 1560 

cgtaacgtaa caggcaaaat tacccgcaag gagctgttga aacaattgtt ggagaaggcc 1620 

25ggcggt 

<2i0> 15 
<211> 1626 
<212> DNA 
30<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic lucif erase 



1626 



35<4O0> 15 

' atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actcttattt gcctcaagcc 120 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 18 0 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 240 

40gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 
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gtggctccag 


tcaacgagag 


ctacattccc 


gacgaactgt 


gtaaagtcat 


gggtatctct 


360 


aagccacaga 


ttgtcttcac 


cactaagaat 


attctgaaca 


aagtcctgga 


agtccaaagc 


420 


cgcaccaact 


ttattaagcg 


tatcatcatc 


ttggacactg 


tggagaatat 


tcacggttgc 


480 


gaa t ct tt gc" 


ctaatttcat 


ctctcgctat 


tcagacggca 


acatcgcaaa 


ctttaaacca 


54-0 


Sctccacttcg 


accctgtgga 


acaagttgca 


gccattctgt 


gtagcagcgg 


tactactgga 


600 


ctcccaaagg 


gagtcatgca 


gacccatcaa 


aacatttgcg 


tgcgtctgat 


ccatgctctc 


660 


gatccacgct 


acggcactca 


gctgattcct 


ggtgtcaccg 


tcttggtcta 


cttgcctttc 


720 


ttccatgctt 


tcggctttca 


tattactttg 


ggttacttta 


tggtcggtct 


ccgcgtgatt 


780 


atgttccgcc 


gttttgatca 


ggaggctttc 


ttgaaagcca 


tccaagatta 


tgaagtccgc 


840 


lOagtgtcatca 


acgtgcctag 


cgtgatcctg 


tttttgtcta 


agagcccact 


cgtggacaag 


900 


tacgacttgt 


cttcactgcg 


tgaattgtgt 


tgcggtgccg 


ctccactggc 


taaggaggtc 


960 


crctcraaqtaq 


ccgccaaacg 


cttgaatctt 


ccagggattc 


gttgtggctt 


cggcctcacc 


1020 


gaatctacca 


gcgctattat 


tcagtctctc 


cgcgatgagt 


ttaagagcgg 


ctctttgggc 


1080 


ccrtcjtcactc 


cactcatggc 


tgctaagatc 


gctgatcgcg 


aaactggtaa 


ggctttgggc 


1140 


ISccgaaccaag 


tqqqcqaqct 


gtgtatcaaa 


ggccctatgg 


tgagcaaggg 


ttatgtcaat 


1200 


aacgttgaag 


ctaccaagga 


ggccatcgac 


gacgacggct 


ggttgcattc 


tggtgatttt 


1260 


ggatattacg 


acgaagatga 


gcatttttac 


gtcgtggatc 


gttacaagga 


gctgatcaaa 


1320 


tacaagggta 


gccaggttgc 


tccagctgag 


ttggaggaga 


ttctgttgaa 


aaatccatgc 


1380 


attcgcgatg 


tcgctgtggt 


cggcattcct 


gatctggagg 


ccggcgaact 


gccttctgct 


1440 


2 0ttcgttgtca 


aqcagcctgq 


taaagaaatt 


accgccaaag 


aagtgtatga 


ttacctggct 


1500 


gaacgtgtga 


gccatactaa 


gtacttgcgt 


ggcggcgtgc 


gttttgttga 


ctccatccct 


1560 


cgtaacgtaa 


caggcaaaat 


tacccgcaag 


gagctgttga 


aacaattgtt 


ggagaaggcc 


1620 


ggcggt 












1626 


25<210> 16 














<211> 1626 














<212> DNA 














<213> Artificial Sequence 










30<220> 














<223> Sequence of a synthetic lucif erase 








<400> 16 














atgatgaagc 


gtgagaaaaa 


tgtcatctat 


ggccctgagc 


ctctccatcc 


tttggaggat 


60 


35ttgactgccg 


•gcgaaatgct 


gtttcgtgct 


ctccgcaagc 


CX W Xm» O Vm^ C4. Li* O w 


gcct caagcc 


120 


ttggtcgatg 


tggtcggcga 


tgaatctttg 


agctacaagg 


agttttttga 


ggcaaccgtc 


180 


ttgctggctc 


agtccctcca 


caattgtggc 


tacaagatga 


acgacgtcgt 


tagtatctgt 


240 


gctgaaaaca 


atacccgttt 


cttcattcca 


gtcatcgccg 


catggtatat 


cggtatgatc 


300 


gtggctccag 


tcaacgagag 


ctacattccc 


gacgaactgt 


gtaaagtcat 


gggtatctct 


360 


40aagccacaga 


ttgtcttcac 


cactaagaat 


attctgaaca 


aagtcctgga 


.agtccaaagc 


420 
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C3cscccia.cti 


u uauuaagcg 


caucaucauc 


ttggacactg tggagaatat tcacggttgc 


480 


gaatctttgc 


Cuaatitiucau 


CuC ucgc cau 


tcagacggca 


acatcgcaaa 


ctttaaacca 


540 


ctccacttcg 


accctgtgga 


acaagu ugca 


gccattctgt 


gtagcagcgg 


tactactgga 


600 


ctcccaaagg 


gagtcatgca 


gacccatcaa 


aacatttgcg 


tgcgtctgat 


ccatgctctc 


6~60' 


Sgatccacgct 


acggcactca 


4« ^ ^ ^ 4^ 

gccgacuccu 


ggtgtcaccg 


tcttggtcta 


cttgcctttc 


720 


ttccatgctt 


tcggctttca 


cacuacuCug 


ggttacttta tggtcggtct ccgcgtgatt 


780 


atgttccgcc 


gttttgatca 


ggaggctttc 


ttgaaagcca 


tccaagatta 


tgaagtccgc 


840 


agtgtcatca 


acgtgcctrag 


cgtgatcctg 


tttttgtcta 


agagcccact 


cgtggacaag 


900 


tacgacttgt 


cttcactgcg 


tgaattgugu 


tgcggtgccg 


ctccactggc 


taaggaggtc 


960 


lOgctgaagtgg 


ccgccaaacg 


cttgaatctt 


ccagggattc 


gttgtggctt 


cggcctcacc 


1020 


gaatctacca 


gcgctattat 


tcagtc tctc 


cgcgatgagt 


ttaagagcgg 


ctctttgggc 


1080 


cgtgtcactc 


cactcatggc 


tgctaagatc 


gctgatcgcg 


aaactggtaa 


ggctttgggc 


1140 


ccgaaccaag 


tgggcgagct 


gtgfcat caaa 


ggccctatgg 


tgagcaaggg 


ttatgtcaat 


1200 


aacgttgaag 


cbaccaagga 


ggcco. ucgAC 


gacgacggct 


ggttgcattc 


tggtgatttt 


1260 


ISggatattacg 


acgaagatga 


gcauuuuuac 


gtcgtggatc gttacaagga gctgatcaaa 


132 0 


tacaagggta 


gccaggttgc 


tccagctgag 


ttggaggaga 


ttctgttgaa 


aaatccatgc 


1 ft n 


attcgcgatg 


tcgctgtggt 


cggcattcct 


gatctggagg 


ccggcgaact 


gccttctgct 


1440 


ttcgttgtca 


agcagcctgg 


taaagaaatt 


accgccaaag 


aagtgtatga 


ttacctggct 


1500 


gaacgtgtga 


gccatactaa 


gtacttgcgt 


ggcggcgtgc 


gttttgttga 


ctccatccct 


1560 


2 Ocgtaacgtaa 


caggcaaaat 


tacccgcaag 


gagctgttga 


aacaattgtt 


ggagaaggcc 


162 0 


ggcggt 












1626 



<210> 17 
<211> 1626 
25<212> DNA 

<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic luciferase 

30 

<400> 17 

atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 12 0 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

3Sttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 240 

gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 

gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 360 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 420 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 480 

40gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 540 
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ctccacttcg accctgtgga acaagttgca gccattctgt gtagcagcgg tactactgga 600 

ctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 720 

ttccatgctt tcggctttca tattactttg ggttacttta tggtcggtct- ccgcgtgatt -7-80— 

Satgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 840 

agtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 900 

tacgacttgt cttcactgcg. tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctt ccagggattc gttgtggctt cggcctcacc 1020 

gaatctacca gcgctattat tcagtctctc ggggatgagt ttaagagcgg ctctttgggc 10 80 

lOcgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 1140 

ccgaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 1200 

aacgttgaag ctaccaagga ggccatcgac gacgacggct ggttgcattc tggtgatttt 1260 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 1320 

tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 13 80 

ISattcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct 1440 

ttcgttgtca agcagcctgg taaagaaatt accgccaaag aagtgtatga ttacctggct 15 00 

gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgttga ctccatccct 1560 

cgtaacgtaa caggcaaaat tacccgcaag gagctgttga aacaattgtt ggagaaggcc 1620 

ggcggt ^^26 

20 

<210> 18 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 

25 

<220> 

<223> Sequence of a synthetic lucif erase 
<400> 18 

30atgataaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 120 

ttggtcgatg tggtcggcga. tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 240 

gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 

35gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 3 60 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 420 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 480 

gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 540 

ctccacttcg accctgtgga acaagttgca gccattctgt gtagcagcgg tactactgga 600 

40ctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 
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gatccacgct acggcactca 
ttccatgctt tcggctttca 
atgttccgcc gttttgatca 



gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 
tattactttg ggttacttta tggtcggtct ccgcgtgatt 
ggaggctttc ttgaaagcca" tccaagatta tgaagtccgc 



720 



780 



840 



agtgtcatca ' 


acgtgcctag 


cgtgatcctg 


tttttgtcta 


agagcccact 


cgugyaca.dg 


900 


Stacgacttgt 


cttcactgcg 


tgaattgtgt 


tgcggtgccg 


ctccactggc 


uaaggaggcc 


960 


gctgaagtgg 


ccgccaaacg 


cttgaatctt 


ccagggattc 


gttgtggctt 


4* S ^1 

cggcc ucawc 


1 090 

JL\J ^ \J 


gaatctacca 


gtgcgattat 


ccagactctc 


9999atgagt 


uraagagcgg 




1080 


cgtgtcactc 


cactcatggc 


tgctaagatc 


gctgatcgcg 


aaactggt:aa 


ggcctcgggc 




ccgaaccaag 


tgggcgagct 


gtgtatcaaa 


ggccctatgg 


tgagcaaggg 


'tt^a'tgtcaat 


T o n n 


lOaacgttgaag 


ctaccaagga 


ggccatcgac 


gacgacggct 


ggttgcattc 


tggtgatttt 


1260 


ggatattacg 


acgaagatga 


gcatttttac 


gtcgtggatc 


gttacaagga 


gctgatcaaa 


1320 


tacaagggta 


gccaggttgc 


tccagctgag 


ttggaggaga 


ttctgttgaa 


aaatccatgc 


1380 


attcgcgatg 


tcgctgtggt 


cggcattcct 


gatctggagg 


ccggcgaact 


gccttctgct 


1440 


ttcgttgtca 


agcagcctgg 


tacagaaatt 


accgccaaag 


aagtgtatga 


ttacctggct 


1500 


ISgaacgtgtga 


gccatactaa 


gtacttgcgt 


ggcggcgtgc 


gttttgttga 


ctccatccct 


1560 


cgtaacgtaa 


caggcaaaat 


tacccgcaag 


gagctgttga 


aacaattgtt 


ggtgaaggcc 


1620 


ggcggt 












1626 



<210> 19 

20<211> 933 

.<212> DNA 

<213> Renilla reniformis 



<400> 19 



25atgacttcga 


aagtttatga 


tccagaacaa 


aggaaacgga 


tgataactgg 


tccgcagtgg 


60 


tgggccagat 


gtaaacaaat 


gaatgttctt 


gattcattta 


ttaattatta 


tgattcagaa 


120 


aaacatgcag 


aaaatgctgt 


tattttttta 


catggtaacg 


cggcctcttc 


ttatttatgg 


180 


cgacatgttg 


tgccacatat 


tgagccagta 


gcgcggtgta 


ttataccaga 


tcttattggt 


240 


atgggcaaat 


caggcaaatc 


tggtaatggt 


tcttataggt 


tacttgatca 


ttacaaatat 


300 


30cttactgcat 


ggtttgaact 


tcttaattta 


ccaaagaaga 


tcatttttgt 


cggccatgat 


360 


tggggtgctt 


gtttggcatt 


tcattatagc 


tatgagcatc 


aagataagat 


caaagcaata 


420 


gttcacgctg 


aaagtgtagt 


agatgtgatt 


gaatcatggg 


atgaatggcc 


tgatattgaa 


480 


gaagatattg 


cgttgatcaa 


atctgaagaa 


ggagaaaaaa 


tggttttgga 


gaataacttc 


540 


ttcgtggaaa 


ccatgttgcc 


atcaaaaatc 


atgagaaagt 


tagaaccaga 


agaatttgca 


600 


35gcatatcttg 


aaccattcaa 


agagaaaggt 


gaagttcgtc 


gtccaacatt 


atcatggcct 


660 


cgtgaaatcc 


cgttagtaaa 


aggtggtaaa 


cctgacgttg 


tacaaattgt 


taggaattat 


720 


" aatgcttatc 


tacgtgcaag 


tgatgattta 


ccaaaaatgt 


ttattgaatc 


ggatccagga 


780 


ttcttttcca 


atgctattgt 


tgaaggcgcc 


aagaagtttc 


ctaatactga atttgtcaaa 


840 


gtaaaaggtc 


ttcatttttc 


gcaagaagat 


gcacctgatg 


aaatgggaaa 


atatatcaaa 


900 


40tcgttcgttg 


agcgagttct 


caaaaatgaa 


caa 






933 
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<210> 20 
<211> 933 
<212> DNA 



<213> Artificial Sequence 

5 

<220> 

<22 3> Sequence of a synthetic lucif erase 
<400> 20 

lOatggcttcca aggtgtacga ccccgagcag cgcaagcgca tgatcaccgg ccctcagtgg 60 

tgggcccgct gcaagcagat gaacgtgctg gactccttca tcaactacta cgacagcgag 120 

aagcacgccg agaacgccgt gatcttcctg cacggcaacg ccgcctccag ctacctgtgg 180 

aggcacgtgg tgcctcacat cgagcccgtg gcccgctgca tcatccctga cctgatcggc 240 

atgggcaagt ccggcaagag cggcaacggc tcctaccgcc tgctggacca ctacaagtac 3 00 

IBctgaccgcct ggttcgagct gctgaacctg cccaagaaga tcatcttcgt gggccacgac 360 

tggggagcct gcctggcctt ccactactcc tacgagcacc aggacaagat caaggccatc 420 

gtgcacgccg agagcgtggt ggacgtgatc gagtcctggg acgagtggcc tgacatcgag 480 

gaggacatcg ccctgatcaa gagcgaggag ggcgagaaga tggtgctgga gaacaacttc 540 

ttcgtggaga ccatgctgcc cagcaagatc atgcgcaagc tggagcctga ggagttcgcc 600 

20gcctacctgg agcccttcaa . ggagaagggc gaggtgcgcc gccctaccct gtcctggccc 660 

cgcgagatcc ctctggtgaa gggcggcaag cccgacgtgg tgcagatcgt gcgcaactac 72 0 

aacgcctacc tgcgcgccag cgacgacctg cctaagatgt tcatcgagtc cgaccctggc 780 

ttcttctcca acgccatcgt cgagggagcc aagaagttcc ccaacaccga gttcgtgaag 840 

gtgaagggcc tgcacttctc ccaggaggac gcccctgacg agatgggcaa gtacatcaag 900 

25agcttcgtgg agcgcgtgct gaagaacgag cag 933 

<210> 21 
<211> 933 
<212> DNA 
30<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic, lucif erase 
35<400> 21 

atggcttGca aggtgtacga ccccgagcaa cgcaaacgca tgatcactgg gcctcagtgg 60 

tgggctcgct gcaagcaaat gaacgtgctg gactccttca tcaactacta tgattccgag 120 

aagcacgccg agaacgccgt gatttttctg catggtaacg ctgcctccag ctacctgtgg 180 

aggcacgtcg tgcctcacat cgagcccgtg gctcgctgca tcatccctga tctgatcgga 240 

4 0atgggtaagt ccggcaagag cgggaatggc tcatatcgcc tcctggatca ctacaagtac 300 
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ctcaccgctt 


ggttcgagct 


gctgaacctt 




t-ratctfetat 

k« W Cl> w W w w ^ V 


QQQCcaccrac 


360 


tggggggctt 


gtctggcctt 


tcactactcc 


tacgagcacc 


aagacaagat 


caaggccatc 


420 


. Stccatgctg 


agagtgtcgt ggacgtgatc 


gagtcctggg 


acgagtggcc 


tgacatcgag 


480 


gaggatatcg 


ccctgatcaa 


gagcgaagag 


ggcgagaaaa 


tggtgcttga 


gaataac't'bc 


540 


Sttcgtcgaga 


ccatgctccc 


aagcaagatc 


atgcggaaac 


tggagcctga 


99agttCQct 

J J ^ ~ • J - 


600 


gcctacctgg 


agcccttcaa 


ggagaagggc 


gaggttagac 


ggcctaccct 




660 


cgcgagatcc 


ctctcgttaa 


999a9gcaag 


cccgacgtcg tccagattgt 


ccgcaactac 


720 


aacgcctacc 


ttcgggccag 


cgacgatctg 


cctaagatgt 


tcatcgagtc 


cgaccctggg 


780 


ttcttttcca 


acgctattgt 


cgagggagct 


aagaagttcc 


ctaacaccga 


gttcgtgaag 


840 


lOgtgaagggcc 


tccacttcag 


ccaggaggac 


gctccagatg 


aaatgggtaa 


gtacatcaag 


900 


agcttcgtgg 


agcgcgtgct 


gaagaacgag 


cag 






933 



<210> 22 
<211> 933 
15<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic lucif erase 

20 



<400> 22 



atggcttcca 


aggtgtacga 


ccccgagcaa 


cgcaaacgca 


tgatcactgg 


gcctcagtgg 


60 


tgggctcgct 


gcaagcaaat 


gaacgtgctg 


gactccttca 


tcaactacta 


tgattccgag 


12 0 


aagcacgccg 


agaacgccgt 


gatttttctg 


catggtaacg 


ctgcctccag 


ctacctgtgg 


180 


25aggcacgtcg 


tgcctcacat 


cgagcccgtg 


gctagatgca 


tcatccctga 


tctgatcgga 


240 


atgggtaagt 


ccggcaagag 


cgggaatggc 


tcatatcgcc 


tcctggatca 


ctacaagtac 


300 


ctcaccgctt 


ggttcgagct 


gctgaacctt 


ccaaagaaaa 


tcatctttgt 


gggccacgac 


360 


tggggggctt 


gtctggcctt 


tcactactcc 


tacgagcacc 


aagacaagat 


caaggccatc 


420 


gtccatgctg 


agagtgtcgt 


ggacgtgatc 


gagtcctggg 


acgagtggcc 


tgacatcgag 


480 


3 Ogaggatat eg 


ccctgatcaa 


gagcgaagag 


ggcgagaaaa 


tggtgcttga 


gaataacttc 


540 


ttcgtcgaga 


ccatgctccc 


aagcaagatc 


atgcggaaac 


tggagcctga 


ggagttcgct 


600 


gcctacctgg 


agccattcaa 


ggagaagggc 


gaggttagac ggcctaccct 


ctcctggcct 


660 


cgcgagatcc 


ctctcgttaa 


9g9aggcaag 


cccgacgtcg tccagattgt 


ccgcaactac 


720 


aacgcctacc 


ttcgggccag 


cgacgatctg 


cctaagatgt 


tcatcgagtc 


cgaccctggg 


780 


35ttcttttcca 


acgctattgt 


cgagggagct 


aagaagttcc 


ctaacaccga 


gttcgtgaag 


' 840 


gtgaagggcc 


tccacttcag 


ccaggaggac 


gctccagatg 


aaatgggtaa 


gtacatcaag 


900 


agcttcgtgg 


agcgcgtgct 


gaagaacgag 


cag 






933 



<210> 23 
40<211> 543 
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<212> PRT 

<213> Pyrophorus plagiophthalamus 



<400> 23 

5Met Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 
1 5 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Phe Gly Asp Glu 
10 .35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Cys Leu Leu Ala Glix 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

15Ala Glu Asn Asn Lys Arg Phe Phe He. Pro He He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Cys Thr 
20 115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

25Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Tyr Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
30 195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Ala 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

35Phe His Ala Phe Gly Phe Ser He Asn Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Leu Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ala He 
40 275 280 285 
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"lie Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 



305 310 315 320 

5Ala Glu Val Ala Val Lys Arg Leu Asn Leu Pro Gly lie Arg Cys Gly 
325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn lie His Ser Leu Gly Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
10 355 360 365 

Lys lie Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys Val Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

15Asn Val Glu Ala Thr Lys Glu Ala lie Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 
20 435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
450 455 460 

^ Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

25Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr, Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 
30 515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ser Ser Lys Leu 
530 535 540 



<210> 24 
35<211> 542 
<212> PRT 

<213> Artificial Sequence 



<220> 

40<223> Sequence of clone YG#81-6G01 
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<400> 24 

Met Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His ' 

1 5 10 15 

Pro Leu"Glu TSip~Leu~Thr~^^ Iieu~Phe~Arg "AlaTL'eu'firg " 

5 20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser fyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 55 . 60 

lOSer Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val lie Ala Ala Trp Tyr 

85 90 95 

lie 'Gly Met lie Val Ala Pro Val Asn Glu Ser Tyr lie Pro Asp Glu 
15 100 105 110 

Leu Cys Lys Val Met Gly lie Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

20Ile Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser. Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
25 180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 - 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Ala 
210 215 220 

3 0Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 

225 230 235 240 

Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
35 260 265 270 • 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val. _.Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
290 295 300 

4 0Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 



wo 02/16944 



PCT/USOl/26566 



24 

305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly lie Arg Cys Gly 

325 330 335 

Phe"Gly Leu Thr Glu Ser Thr Ser~Ala Asn lie His Ser Leu Arg Asp 
5 340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
370 375 380 

lOGly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 . 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
15 420 425 430 

Asp Arg Tyr Lys Glu Leu He Lyis Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
450' 455 460 

2 0Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
25 500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala isly Gly 
530 535 540 

30 

<210> 25 
<211> 542 
<212> PRT 

<213> Artificial Sequence 

35 

<220> 

<223> Sequence of a synthetic lucif erase 



<400> 25 

40Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
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1 



5 



10 



15 



Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser- His Leu Pro Gin 'Ala Leu' Val- Asp- Var Val GlyAsp - Glu" 
5 35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 



lOAla Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
15 115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

20G1U Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
25 195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Val 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

30Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 
35 275 280 • 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

40Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 



65 



70 



75 



80 
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325 



330 



335 



Phe 



Gly Leu Thr Glu 



Ser Thr 



Ser Ala Asn lie His Ser Leu Arg Asp 



340 



345 



350 



Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

; 355 360 365 

Lys lie Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 . 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 



lOAsn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 
15 435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
.465 " 470 475 480 

2 0Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 
25 515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 S35 540 

<210> 26 
30<211> 542 
<212> PRT 

<213> J^tificial Sequence 
<220> 

35<223> Sequence of a synthetic lucif erase 

<400> 26 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
15 10 15 

40Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 



385 



390 



395 



400 
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% 

20 25 30 

Lys His . Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu'Ser'Tyr^Lys "Glu Plie Ph"e" GliTAla "TKr Val Leu Le"u Ala"Gln " 
5 50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 
85 90 95 

lOIle Gly Met He VaO^ Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
100 105 110 

Leu Cys Lys Val Met Gly He' Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
15 130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leii Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
165 170 175 

2 0Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly "Leu Pro Lys Gly Val Met Gin. Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Val 
25 210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe. Met Val Gly 
245 250 255 

30Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
35 290 • 295 300 

Ser Leu Arg Glu -Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 
325 330 335 

40Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn He His Ser Leu Arg Asp 
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340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
355 360 365 

' Lys~Ile Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
5 370 375 380 

Gly Glu Leu Cys lie Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 
405 410 415 

lOSer Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
15 450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 
485 490 495 

2 0Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg -Gly Gly 
500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
25 530 535 540 



<210> 27 
<211> 542 
<212> PRT 
30<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic lucif erase 
35<400> 27 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
20 25 30 

4 0Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
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35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu'His"Xsh Cys Gly Tyr"Lys Met Asn~Asp Val "Val Ser" lie Cys 
565 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val lie Ala Ala Trp Tyr 

85 90 95 

lie Gly Met He Val Ala Pro Val. Asn Glu Ser Tyr He Pro Asp Glu 
100' 105 110 

lOLeu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
15145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
180 185 190 

20Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met 'Gin Thr 
195 200 205 

His Gin Asn He Cys Val Arg Leu lie His Ala Leu Asp Pro Arg Val 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
25225 230 235 240 

Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
260 265 270 

30Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 
275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
35305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn He His Ser Leu Arg Asp 
340 345 350 

40G1U Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
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355 360 365 

Lys lie Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu L"eu Cys" lie Lys" Gly "Pro Met Val Ser Lys 'Gly^Tyr Va:i' Asn 
5385 390 395 400 

Asn Val Glu Ala Thr Lys. Glu Ala lie Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
420 425 430 

lOAsp Arg Tyr Lys Glu Leu lie Lys Tyr Lys Gly Ser Gin Val Ala Pro 
435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
15465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
500 505 510 

2 oval Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 
515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

25<210> 28 

<211> 542 

<212> PRT 

<213> Artificial Sequence 



30<220> 

<223> Sequence of a synthetic lucif erase 
<400> 28 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
35 1 5 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 40 45 

40 Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
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50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

" Ala" oru Asn" Asn Thr Arg'Phe~"Phe iTe' Pro" Val lie Ala Alk~Trp Tyr 
5 85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
115 120 125 

lOLys Asn He Leu Asn Lys Val Leu Glu Val. Gin Ser Arg Thr Asn Phe 
130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
15 165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
195 200 205 

2 0His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Val 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala. Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 
25 245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 
275 280 285 

3 0He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 
35 325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn He His Ser Leu Arg Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
355 360 365 

4 0Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
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370 375 . 380 

Gly Glu Leu Cys lie Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr^Lys Glu Ala lie Asp Asp Asp Gly Trp Leu His 
5 405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 
435 440 445 

lOAla Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 
15 485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 
515 520 525 

2 0Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

<210> 29 ' ^ 

<211> 542. 
25<212> PRT 

<213> Artificial. Sequence 

<220> 

<223> Sequence of a synthetic luciferase 

30 

<400> 29 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
35 20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 55 60 

40Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
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65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val lie Ala Ala Trp Tyr 
85 90 95 

_ i_ie Gly Met I-l-e Val -Ala Pro Val Asn- Glu Ser Tyr- lie -Pro -Asp Glu- 
5 100 105 110 

Leu Cys Lys Val Met Gly lie Ser Lys Pro Gin lie Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

lOIle Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val- Ala Ala He 
15 180 185 190 

Leu cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Val 
210 215 220 

2 0Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe Ser He Thr Leu Gly. Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
25 260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
290 295 300 

30Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn He His Ser Leu Arg Asp 
35 340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
370 375 380 

40Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
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385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
5 420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
450 455 460 

lOAla Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
15 500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

20 

<210> 30 
<211> 542 
<212> PRT 

<213> Artificial Sequence 

25 

<220> 

<223> Sec[uence of a synthetic lucif erase 
<400> 30 

3 0Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 35 40 45 • 

Asn Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 ' 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

40Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 
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85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
100 105 110 

-Leu -Cys -Lys -Val -Met Gly He -Ser Lys Pro- -Gin -He Val- Phe -Thr Thr 

5 115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 • 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

lOGlu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
15 195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Val 

210 215 220 

Gly Thr Gin Leu He Ser Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

2 0Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 
25 275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

30Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn He His Ser Leu Arg Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
35 355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

40Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 
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405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp~^g~lVr^Ijy^Glu~Leurile Lys Tyr Lys Gly Ser Gin Val Ala Pro 
5 435 440 445 

Ala Glu Leu Glu Glu lie Leu Leu Lys Asn Pro Cys lie Arg Asp Val 

450 455 460 

Ala Val Val Gly lie Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

lOPhe Val Val Lys Gin Pro Gly Lys Glu lie Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 
15 515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 



<210> 31 

20<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

25<223> Sequence of a synthetic luciferase 
<400> 31 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
1 5 10 15 

3 0Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
35 50 55 '60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 
85 90 95 

40Ile Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
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100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
115 120 125 

— Lys" Asn-Ile Leu Asn Lys Val-Leu-Glu Val Gin Ser Arg"Thr "Asn-Phe 
5 130 135 140 

He Lys Arg He He He^Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
165 170 175 

lOAsn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Val 
15 210 215 220 

Gly Thr Gin* Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 
245 250 255 

2 0Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
25 290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 
325 330 335 

30Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn He His Ser Leu Arg Asp 
340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
35 370 375 380 • 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser .Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 
405 410 ' 415 

40 Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
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Asp Arg Tyr Lys Glu 

435 

'~ Ala Glu Leu Glu Glu 
5 450 
Ala Val Val Gly lie 
.465 

Phe Val Val Lys Gin 
485 

10 Asp Tyr Leu Ala Glu 

500 

Val Arg Phe Val Asp 
515 

Arg Lys Glu Leu Leu 
15 530 



38 

425 

Leu lie Lys Tyr Lys Gly 

440 

lie Leu Leu Lys Asn Pro 
455 

Pro Asp Leu Glu Ala Gly 
470 475 
Pro Gly Lys Glu He Thr 
490 

Arg Val Ser His Thr Lys 

505 

Ser He Pro Arg Asn Val 
520 

Lys Gin Leu Leu Glu Lys 
535 



430 

Ser Gin Val Ala Pro 
445 

Cys lie Arg Asp Val 
460 

Glu Leu Pro Ser Ala 
480 

Ala Lys Glu Val Tyr 
495 

Tyr Leu Arg Gly Gly 
510 

Thr Gly Lys He Thr 
525 

Ala Gly Gly 
540 



<210> 32 
<211> 542 
<212> PRT 
20<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic lucif erase 
25<400> 32 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

1.5 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
20 25 30 

3 0Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
3565 70' 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
100 105 110 

4 0Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
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115 120 125 

Lys Asn lie Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

~-Ile Lys-Arg lie lie lie Leu Asp Thr Val- Glu' Asn" lie" His Gly Cys 
5145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
180 185 190 

lOLeu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
15225 230 235 240 

Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
260 265 270 

2 0Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
25305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 
340 345 350 

3 0Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
35385 390 ' 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
420 425 430 

40Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 
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Ala Glu Leu Glu Glu 

450 

' Ala Val Val Gly'lle 
5465 

Phe Val Val Lys Gin 
485 

Asp Tyr Leu Ala Glu 
500 

lOVal Arg Phe Val Asp 
515 

Arg Lys Glu Leu Leu 
530 



40 

440 

lie Leu Leu Lys Asn Pro 

455 

Pro Asp Leu Glu Ala Gly 
470 475 
Pro Gly Lys Glu He Thr 
490 

Arg Val Ser His Thr Lys 
505 

Ser He Pro Arg Asn Val 

520 

Lys Gin Leu Leu Glu Lys 
535 



445 

Cys He Arg Asp Val 

t 

460 

Glu Leu Pro Ser Ala 
480 

Ala Lys Glu Val Tyr 
495 

Tyr Leu Arg Gly Gly 
510 

Thr Gly Lys He Thr 

525 

Ala Gly Gly 
540 



15<210> 33 
<211> 542 
<212> PRT 

<213> Artificial Sequence 



20<220> 

<223> Sequence of. a synthetic lucif erase 
<400> 33 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
25 1 5 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 40 45 ' 

3 0Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 .55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 
35 ' 85 90 95 - 

■ He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
115 120 125 

40Lys Asn He Leu Asn Lys Val Leu Glu Val Gin ^Ser Arg Thr Asn Phe 
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130 135 140 

lie Lys Arg lie lie He Leu Asp Thr Val Glu Asn' He His Gly Cys 
145 150 155 160 

— Glu -Ser I»eu- Pro Asn-Phe I-le -Ser- Arg Tyr Ser Asp- Gly Asn -lie Aia- 
5 ^ 165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
195 200 205 

lOHis Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 
210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 . 230 235 240 

Phe His Ala Phe Gly Phe. His He Thr Leu Gly Tyr Phe Met Val Gly 
15 245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 
275 280 285 

2 0Ile Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
290 295 300 

Ser Leu Arg Glu . Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 
25 325 330 335 

• Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 
340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
355 360 365 

30Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
370. 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 
35 * 405 410 415 

Ser Gly- -Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu lie Lys Tyr Lys Gly Ser Gin Val Ala Pro 
435 440 445 

4 0Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
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450 455 460 

Ala Val Val Gly lie Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

- Phe Val Val Lys'Gln'Pro "Gly Lys 'Glu lie Thr Ala" Lys ~Glu Val Tyr 
5 485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser lie Pro Arg Asn Val Thr Gly Lys lie Thr 
515 520 525 

lOArg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

<210> 34 
<211> 542 
15<212> PRT 

<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic luciferase 

20 

<400> 34 

Met Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 

1 5 . 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
25 . . • , 20 / "25 30 

Lys His Ser His. Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 '40 -45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 55 60 

30Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
35 100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

40He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
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145 150 155 160 

Glu Ser Leu Pro Asn Phe lie Ser Arg Tyr Ser Asp Gly Asn lie Ala 
165 170 • 175 

- Asn Phe- Lys Pro- Leu -His-Phe Asp-Pro Val-Glu Gin Val- Ala -Ala- lie 
5 180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 2 05 

His Gin Asn lie Cys Val Arg Leu lie His Ala Leu Asp Pro Arg Tyr 
210 215 220 

10 Gly Thr Gin Leu lie Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe His lie Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val lie Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
15 260 265 270 

Ala lie Gin Asp Tyr Glu Val Arg Ser Val lie Asn Val Pro Ser Val 

275 280 285 

lie Leu Phe Leu Ser Lys Ser P.ro Leu Val Asp Lys Tyr Asp Leu Ser 
290 295 300 

20Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
.305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly lie Arg Cys Gly 

325 330 ^ 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala lie He Gin Ser Leu Arg Asp 
25 340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
370 375 380 

30Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp .Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
35 420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
450 455 460 

40Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
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465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu lie Thr Ala Lys Glu Val Tyr 

485 490 495 

-Asp' Tyr Leu~Ala Glu~Arg Val Ser'His Th:r'Lys~Tyr Leu Arg Gly Gly 
5 500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 . 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

10 

<210> 35 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
15 • 
<220> 

<223> An oligonucleotide 
<400> 35 

20acgccagccc aagcttaggc ctgagtggc 

<210> 36 
. <211> 44 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 36 

cttaattctc cccatccccc tgttgacaat taatcatcgg ctcg 
<210> 37 

<211> 40 • 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 37 

tataatgtga ggaattgcga gcggataaca atttcacaca 

<210> -38 

5<211> 40 
<212> DNA 

<213> Artificial Sequence 

<220> 

10<223> An oligonucleotide 
<400> 38 

atgggatgtt acctagacca atatgaaata tttggtaaat 

15<210> 39 
•<211> 40 
<212> DNA 

<213> Artificial Sequence. 

20<220> 

<223> An oligonucleotide 

<400> 39 

aaatgcttaa tgaatttcaa aaaaaaaaaa aaaggaattc 

25 

<210> 40 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 40 

3 5gatatcaagc ttatcgatac cgtcgacctc gaggattata 

<210> 41 
<211> 37 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
- <4r00> 41 ' - : ~ ~ 

Stagaaaaagg cctcggcggc cgctagttca gtcagtt 

<210> 42 
<211> 17 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 



15<400> 42 

aactgactga actagcg 

<210> 43 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 



<220> 

<22 3> An oligonucleotide 

25 

<400> 43 

gccgccgagg cctttttcta tataatcctc gaggtcgacg 

<210> 44 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 44 

gtatcgataa gcttgatatc gaattccttt tttttttttt 



40<210> 45 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 45 

agcttgatat cgaattcctt tttttttttt tttgaaattc 

10 

<210> 46 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<2 2 3> An oligonucleotide 

<400> 46 

20ttgaaattca ttaagcattt atttaccaaa tatttcatat 

<210> 47 
<211> 40 
<212> DKTA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 47 

tggtctaggt aacatcccat cactagcttt tttttctata 

<210> 48 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 48 

tcgcaattcc tcacattata cgagccgatg attaattgtc 

5<211> 53 
<212> DNA 

<213> Artificial Sequence 
<:220> 

10<223> An oligonucleotide 
<400> 49 

aacaggggga tggggagaat taaggccact caggcctaag cttgggctgg cgt 

15<210> 50 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 50 

ggaaacagga tcccatgatg aaacgcgaaa agaacgtgat 

25 

<210> 51 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 51 

35ctacggccca gaaccactgc atccactgga agacctcacc 

<210> 52 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

<4-00>-52 

Sgctggtgaga tgctcttccg agcactgcgt aaacatagtc 

<210> 53 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 53 

acctccctca agcactcgtg gacgtcgtgg gagacgagag 

<210> 54 . 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 54 

Gctctcctac aaagaatttt tcgaagctac tgtgctgttg 

<210> 55 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 

<220> 

35<223> An oligonucleotide 
<400> 55 

gcccaaagcc tccataattg tgggtacaaa atgaacgatg 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<22 0> 

<223> An oligonucleotide 
<400> 56 

tggtgagcat ttgtgctgag aataacactc gcttctttat 

10 

<210> 57 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 57 

2 0tcctgtaatc . gctgcttggt acatcggcat gattgtcgcc 

<210> 58 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 58 

cctgtgaatg aatcttacat cccagatgag ctgtgtaagg 

<210> 5? 
<211> 40 
35<212> DNA 

<213> Artificial Sequence - 

<220> 

<223> An oligonucleotide 
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<400> 59 

ttatgggtat tagcaaacct caaatcgtct ttactaccaa 

— -<2T0> 60 ' 

5<211> 40 
<212> DNA 

<213> Artificial Sequence 

<220> 

10<223> An oligonucleotide 
<400> 60 

aaacatcttg aataaggtct tggaagtcca gtctcgtact 

15<210> 61 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 61 

aacttcatca aacgcatcat tattctggat accgtcgaaa 

25 

<210> 62 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 62 

35acatccacgg ctgtgagagc ctccctaact tcatctctcg 

<210> 63 
- <211> 40 
<212> DNA 
• 40<213> Artificial Sequence 
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<220> 

<:223> An oligonucleotide 

<4-00 :^ 63 ' 

Sttacagcgat ggtaatatcg ctaatttcaa gcccttgcat 

<210> 64 
<211> 40 
<212> DNA 
10<213> Artificial Sec[uence 

<:220> 

<223> An oligonucleotide 
15<400> 64 

tttgatccag tcgagcaagt ggccgctatt ttgtgctcct 
<210> 65 

<211> 40 ^. . 

20<212> DNA 

<213> Artificial Secfuence 

<220> 

<223> An oligonucleotide 

25 

<400> 65 

ccggcaccac tggtttgcct aaaggtgtca tgcagactca 

<210> 66 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 

<400> 66 

ccagaatatc tgtgtgcgtt tgatccacgc tctcgaccct 
40<210> 67 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 67 

cgtgtgggta ctcaattgat ccctggcgtg actgtgctgg 

10 

<210> 68 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 68 

20tgtatctgcc tttctttcac gcctttggtt tctctattac 

<210> 69 
<211> 40 
<212> DNA 
2S<213> Artificial Sequence 

<220> 

<22 3> An oligonucleotide 
30<400> 69 

cctgggctat ttcatggtcg gcttgcgtgt catcatgttt 

<210> 70 
<211> 40 
35<212> DNA 

<213> Artificial,. Sequence 

<220> 

<223> An oligonucleotide 
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<400> 70 

cgtcgcttcg accaagaagc cttcttgaag gctattcaag 

<210>" 71 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 71 

actacgaggt gcgttccgtg atcaacgtcc cttcagtcat 

15<210> 72 
<211> 43 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 72 

tttgttcctg agcaaatctc ctttggttga caagtatgat ctg 

25 

<210> 73 

<211> 37 . 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 73 

35agcagcttgc gtgagctgtg ctgtggcgct gctcctt 

<210> 74 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<4d~0>~74 

Stggccaaaga agtggccgag gtcgctgcta agcgtctgaa 

<210> 75 
<211> 40 

<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 75 

cctccctggt atccgctgcg gttttggttt gactgagagc 

<210> 76 
<211> 40 
20<2X2> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 75 

acttctgcta. acatccatag cttgcgagac gagtttaagt 

<210> 77 
30,<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

3 5<223> An oligonucleotide 
<400> 77 

ctggtagcct gggtcgcgtg actcctctta tggctgcaaa 
40<210> 78 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 



5<220> 

<223'> An oligonucleotide 
<400> 78 

gatcgccgac cgtgagaccg gcaaagcact gggcccaaat 

10 

<210> 79 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 79 

20caagtcggtg aattgtgtat taagggccct atggtctcta 

<210> 80 

<211> 40 

<212> DNA 

25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 80 

aaggctacgt gaacaatgtg gaggccacta aagaagccat 

<210> 81 

<211> 40 

35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 81 

tgatgatgat ggctggctcc atagcggcga cttcggttac 40 



< X U >^ » iS 

5<211> 40 - 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 82 

tatgatgagg acgaacactt ctatgtggtc gatcgctaca ' -40 

15<210> 83 
<211> 40 ' 
<212> DNA 

<213> Artificial Sequence 



20<220> 

<223> An oligonucleotide 



<400> 83 

aagaattgat taagtacaaa ggctctcaag tcgcaccagc 40 

25 

<210> 84 
<211> 40 
<212> DNA 

<213> Artificial. Sequence 

30 

<220> 

<223> An oligonucleotide 

<400> 84 

35cgaactggaa gaaattttgc tgaagaaccc ttgtatccgc 4 0 

<210> 85 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

<400> 85 ~ ~ 

Bgacgtggccg tcgtgggtat .cccagacttg gaagctggcg 

<210> 86 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 86 

agttgcctag cgcctttgtg gtgaaacaac ccggcaagga 

<210> 87 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 87 

gatcactgct aaggaggtct acgactattt ggccgagcgc 

<210> 88 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 88 

gtgtctcaca ccaaatatct gcgtggcggc gtccgcttcg 
40<210> 89 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 89 

tcgattctat tccacgcaac gttaccggta agatcactcg 

10 

<210> 90 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 90 

2 0taaagagttg ctgaagcaac tcctcgaaaa agctggcggc 

<210> 91 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 91 

tagtaaagtc ttcatgatta tatagaaaaa aaagctagtg 

<210> 92 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 92 

taatcatgaa gactttacta gccgccagct ttttcgagga 

<210>^93" 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 

<400> 93 

gttgcttcag caactcttta cgagtgatct taccggtaac- 

15<210> 94 
<211> 39 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<22 3> An oligonucleotide 

<400> 94 

gttgcgtgga atagaatcga cgaagcggac gccgccacg 

25 

<210> 95 
<211> 41 
<212> DNA 

<213> Artificial Secjuence 

30 

<220> 

<223> An oligonucleotide 
<400> 95 

35cagatatttg gtgtgagaca cgcgctcggc caaatagtcg t 

<210> 96 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<223> An Oligonucleotide 



<400> 96 

Sagacctcctt agcagtgatc tccttgccgg gttgtttcac 

<210> 97 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 97 

cacaaaggcg ctaggcaact cgccagcttc caagtctggg 

<210> "98 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 98 

atacccacga cggccacgtc gcggatacaa gggttcttca 

<210> 99 
30<211> 40 
<212> DNA 

<213> Artificial Secjuence 
<220> 

35<223> An oligonucleotide 
<400> 99 

gcaaaatttc ttccagttcg gctggtgcga cttgagagcc 
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40<:210> 100 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 

5<220> V 
<223> An oligonucleotide 

<40b> 100 

tttgtactta atcaattctt tgtagcgatc gaccacatag 

10 

<210> 101 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 101 

20aagtgttcgt cctcatcata gtaaccgaag tcgccgctat 

<210> 102 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 102 

ggagccagcc atcatcatca atggcttctt tagtggcctc 

<210> 103 
<211> 40 
35<212> DNA 

<213> Artificial Secjuence 

<220> 

<223> An oligonucleotide 

40 
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<400> 103 

cacattgttc acgtagcctt tagagaccat agggccctta 40 

~^<2i0>^04 - 

5<211> 40 
<212> DNA 

<213> Artificial Sequence 



<220> 

10<223> An oligonucleotide 
<400> 104 

atacacaatt caccgacttg atttgggccc agtgctttgc 40 

15<210> 105 
<2ia> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 105 

cggtctcacg gtcggcgatc tttgcagcca taagaggagt 40 

25 

<210> 106 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

3 0 

<220> 

<223> An oligonucleotide 
<400> 106 

3 5cacgcgaccc aggctaccag acttaaactc gtctcgcaag , 40 

107 
40 
DNA 

Artificial Sequence 



<210> 
<211> 
<212> 
40<213> 
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<220> 

<223> An oligonucleotide 
<400> 107 

Sctatggatgt tagcagaagt gctctcagtc aaaccaaaac 

<210> 108 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

15<400> 108 

cgcagcggat accagggagg ttcagacgct tagcagcgac 

<210> 109 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 109 

ctcggccact tctttggcca aaggagcagc gccacagcac 

<210> 110 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> Aa oligonucleotide 

<400> 110 

agctcacgca agctgctcag atcatacttg tcaaccaaag 



40<210> 111 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 



5<220> 

<223> An oligonucleotide 
<400> 111 

gagatttgct caggaacaaa atgactgaag ' ggacgttgat 

10 

<210> 112 
<211> 36 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An x>ligonucleotide 

<400> 112 

2 0cacggaacgc acctcgtagt cttgaatagc cttcaa 

<210> 113 
<211> 44 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 113 

gaaggcttct tggtcgaagc gacgaaacat gatgacacgc aagc 

<210> 114 
<211> 40 
35<212> DNA 

<213> Artificial Sequence.. 

<220> 

<223> An oligonucleotide 
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<400> 114 

cgaccatgaa atagcccagg gtaatagaga aaccaaaggc 

<210> 115 . 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 115 

gtgaaagaaa ggcagataca ccagcacagt cacgccaggg 

15<210> 116 
<211> 40 
<212> DNA 

<213> Artificial. Sequence 

20<220> 

<223> An oligonucleotide 

<400> 116 

atcaattgag tacccacacg agggtcgaga gcgtggatca 

25 

<210> 117 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 117 

35aacgcacaca gatattctgg tgagtctgca tgacaccttt 

<210> 118 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

— <-4-0 0>r-l-l-8 

Saggcaaacca gtggtgccgg aggagcacaa aatagcggcc 

<210> 119 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 119 

acttgctcga ctggatcaaa atgcaagggc ttgaaattag 

<210> 120 
<2ll> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<22 3> An oligonucleotide 

25 

<400> 120 

cgatattacc atcgctgtaa cgagagatga agttagggag 

<210> 121 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 

<400> 121 

gctctcacag ccgtggatgt tttcgacggt atccagaata 



40<2X0> 122 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 



5<220> 

<223> An oligonucleotide 
<400> 122 

atgatgcgtt tgatgaagtt agtacgagac tggacttcca 

10 

<210> 123 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 123 

20agaccttatt caagatgttt ttggtagtaa agacgatttg 

<210> 124 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 124 

aggtttgcta atacccataa ccttacacag ctcatctggg 

<210> 125 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 125 

atgtaagatt cattcacagg ggcgacaatc atgccgatgt 4 0 



<210> 126 
5<211> 40 
<212> DNA 

<213> Artificial Secjuence 

<220> 

10<223> An oligonucleotide 
<400> 126 

accaagcagc gattacagga ataaagaagc gagtgttatt 40 

15<210> 127 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220>. 

<223> An oligonucleotide 

<400> 127 

ctcagcacaa atgct caeca catcgttcat tttgtaccca 40 

25 

<210> 128 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 128 

35caattatgga ggctttgggc caacagcaca gtagcttcga 40 

<210> 129 
<211> 40 
<212> DNA 
40<213> Artificial Secfuence 
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<220> 

<223> An oligonucleotide 

<AOQ> 129 

Saaaattcttt gtaggagagg ctctcgtctc ccacgacgtc 

<210> 130 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 130 

cacgagtgct tgagggaggt gactatgttt acgcagtgct 

<210> 131 

<211> 40 

-20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 131 

cggaagagca tctcaccagc ggtgaggtct tccagtggat 

<210> 132 V 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 

<400> 132 

gcagtggttc tgggccgtag atcacgttct tttcgcgttt 
40<210> 133 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 



5<220> 

<223> An oligonucleotide 
<400> 133 

catcatggga tcctgtttcc tgtgtgaaat tgttatccgc 

10 

<210> 134 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 

<400> 134 

20ggaaacagga tcccatgatg aagcgtgaga aaaatgtcat 

<210> 135 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 135 

ctatggccct gagcctctcc atcctttgga ggatttgact 

<210> 136 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 136 

gccggcgaaa tgctgtttcg tgctctccgc aagcactctc 

<210> 137 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 

<400> 137 

atttgcctca agccttggtc gatgtggtcg gcgatgaatc 

15<210> 138 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 138 

tttgagctac aaggagtttt ttgaggcaac cgtcttgctg 

25 

<210> 139 
<211> 40 

<212> DNA 

<213> Artificial Sequence 

30 

<22 0> 

<223> An oligonucleotide 
<400> 139 

35gctcagtccc tccacaattg tggctacaag atgaacgacg 

<210> 140 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

— -< 4 0 0 > ^r40 

Stcgttagtat ctgtgctgaa aacaataccc gtttcttcat 

<210> 141 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

15<400> 141 

tccagtcatc gccgcatggt atatcggtat gatcgtggct 

<210> 142 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 142 

ccagtcaacg agagctacat tcccgacgaa ctgtgtaaag 

<210> 143 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 143 

tcatgggtat ctctaagcca cagattgtct tcaccactaa 



40<210> 144 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 144 

gaatattctg aacaaagtcc tggaagtcca aagccgcacc 

10 

<210> 145 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 

<400> 145 

2 0aactttatta agcgtatcat catcttggac actgtggaga 
<210> 146 

<211> 40* . 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 146 

atattcacgg ttgcgaatct ttgcctaatt tcatctctcg 

<210> 147 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 147 

ctattcagac ggcaacatcg caaactttaa accactccac 40 

'^2yo>^ 148 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 148 

ttcgaccctg tggaacaagt tgcagccatt ctgtgtagca 40 

15<210> 149 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 149 

gcggtactac tggactccca aagggagtca tgcagaccca 40 

25 

<210> 150 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 150 

35tcaaaacatt tgcgtgcgtc tgatccatgc tctcgatcca • 40 

<210> 151 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

"<400> 151 

Scgctacggca ctcagctgat tcctggtgtc accgtcttgg 
<210> 152 

<211> 40 , . 

<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 152 

tctacttgcc tttcttccat gctttcggct ttcatattac 

<210> 153 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 153 

tttgggttac tttatggtcg gtctccgcgt gattatgttc 

<210> 154 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 

<400> 154 

cgccgttttg atcaggaggc tttcttgaaa gccatccaag 



40<210> 155 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 



5<220> 

<223> An oligonucleotide 
<400> 155 

attatgaagt ccgcagtgtc atcaacgtgc ctagcgtgat 

10 

■<210> 156 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 

<400> 156 

2 0cctgtttttg tctaagagcc cactcgtgga caagtacgac 

<210> 157 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 157 

ttgtcttcac tgcgtgaatt gtgttgcggt gccgctccac 

<210> 158 

<211> 40 

35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 158 

tggctaagga ggtcgctgaa gtggccgcca aacgcttgaa 

" <210> 159 ~ 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 

<400> 159 

tcttccaggg attcgttgtg gcttcggcct caccgaatct 

15<210> 160 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<22 3> An oligonucleotide 

<400> 160 

accagcgcta ttattcagtc tctccgcgat gagtttaaga 

25 

<210> 161 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 161 

35gcggctcttt gggccgtgtc actccactca tggctgctaa 

<210> 162 

<211> 40 

<212> DNA 

40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

<400> 162 — - 

Sgatcgctgat cgcgaaactg gtaaggcttt gggccctaac 

<210> 163 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 163 

caagtgggcg agctgtgtat caaaggccct atggtgagca 

<210> 164 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 164 

agggttatgt caataacgtc gaagctacca aggaggccat 

<210> 165 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 

<220> . 

35<223> An oligonucleotide 
<400> 165 

cgacgacgac ggctggttgc attctggtga ttttggatat 
40<210> 166 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 



5<220> 

<223> An oligonucleotide 
<400> 166 

tacgacgaag atgagcattt ttacgtcgtg gatcgttaca 

10 

<210> 167 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 167 

20aggagctgat caaatacaag ggtagccagg ttgctccagc 

<210> 168 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 168 

tgagttggag gagattctgt tgaaaaatcc atgcattcgc 

<210> 169 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 



wo 02/16944 



81 

- <400> 169 

gatgtcgctg tggtcggcat tcctgatctg gaggccggcg 

— <2^0>-il:70 

5<211> 40 
<212> DNA 

<213> Artificial Sequence 

<220> 

10<223> An oligonucleotide 
<400> 170 

aactgccttc tgctttcgtt gtcaagcagc ctggtaaaga 

15<210> 171 
<211> 40 

<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 171 

aattaccgcc aaagaagtgt atgattacct ggctgaacgt 

25 

<210> 172. 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 172 

35gtgagccata ctaagtactt gcgtggcggc gtgcgttttg 

<210> 173 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

<4 00> TtT 

Sttgactccat ccctcgtaac gtaacaggca aaattacccg 

<210> 174 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 174 

caaggagctg ttgaaacaat tgttggagaa ggccggcggt 

<210> 175 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
25 *' 

<400> 175 

tagtaaagtc ttcatgatta tatagaaaaa aaagctagtg 

<210> 176 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 

<400> 176 

taatcatgaa gactttacta accgccggcc ttctccaaca 



40<210> 177 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 



5<220> 

<223> An oligonucleotide 
<400> 177 

attgtttcaa cagctccttg cgggtaattt tgcctgttac 

10 

<210> 178 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 178 

2 0gttacgaggg atggagtcaa caaaacgcac gccgccacgc 

<210> 179 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 179 

aagtacttag tatggctcac acgttcagcc aggtaatcat 

<210> 180 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 180 

acacttcttt ggcggtaatt tctttaccag gctgcttgac 

<210> 181 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 

<400> 181 

aacgaaagca gaaggcagtt cgccggcctc cagatcagga 

15<210> 182 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 182 

atgccgacca cagcgacatc gcgaatgcat ggatttttca 

25 

<210> 183 
<211> 40 
<212> DNA 

.<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 183 

35acagaatctc ctccaactca gctggagcaa cctggctacc 

<210> 184 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

- <4"00> 184 

Scttgtatttg atcagctcct tgtaacgatc cacgacgtaa 

<210> 185 

<211> 40 

<212> DNA 

10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 185 

aaatgctcat cttcgtcgta atatccaaaa tcaccagaat 

<210> 186 
<211> 40 
20<212> DNA 

<213> Artificial Secjuence 

<220> 

<223> An oligonucleotide 

25 

<400> 186 

gcaaccagcc gtcgtcgtcg atggcctcct tggtagcttc 

<210> 187 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 187 

gacgttattg acataaccct tgctcaccat agggcctttg 
40<210> 188 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 188 

atacacagct cgcccacttg gttagggccc aaagccttac 

10 

<210> 189 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 189 

20cagtttcgcg atcagcgatc ttagcagcca tgagtggagt 

<210> 190 

<211> 40 

<212> DNA 

25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 190 

gacacggccc aaagagccgc tcttaaactc atcgcggaga 

<210> 191 
<211> 37 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<406> 191 

gactgaataa tagcgctggt agattcggtg aggccga 

<210> 192 
5<211> 43 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 192 

agccacaacg aatccctgga agattcaagc gtttggcggc cac 

15<210> 193 

<211> 40 

<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 193 

ttcagcgacc tccttagcca gtggagcggc accgcaacac 

25 

<210> 194 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide . 
<400> 194 

35aattcacgca gtgaagacaa gtcgtacttg tccacgagtg 

<210> 195 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 195 

Sggctcttaga caaaaacagg atcacgctag gcacgttgat 

<210> 196 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<2 23> An oligonucleotide 

15<400> 196 

gacactgcgg acttcataat cttggatggc tttcaagaaa 

<210> 197 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 197 

gcctcctgat caaaacggcg gaacataatc acgcggagac 

<210> 198 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 

<400> 198 

cgaccataaa gtaacccaaa gtaatatgaa agccgaaagc 



40<210> 199 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 199 

atggaagaaa ggcaagtaga ccaagacggt gacaccagga 

10 

<2io> 200 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 

<400> 200 

20atcagctgag tgccgtagcg tggatcgaga gcatggatca 

<210> 201 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 201 

gacgcacgca aatgttttga tgggtctgca tgactccctt 

<210> 202 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 202 

fcgggagtcca gtagtaccgc tgctacacag aatggctgca 

<210> 203 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 

<400> 203 

acttgttcca cagggtcgaa gtggagtggt ttaaagtttg 

15<210> 204 

<211> 40 

<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 204 

cgatgttgcc gtctgaatag cgagagatga aattaggcaa 

25 

<210> 205 

<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 205 

35agattcgcaa ccgtgaatat tctccacagt gtccaagatg 

<210> 206 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

- -<400> 206- 

Satgatacgct taataaagtt ggtgcggctt tggacttcca 

<210> 207 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

15<400> 207 

ggactttgtt cagaatattc ttagtggtga agacaatctg 

<210> 208 
<211> 40 
2 0<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 208 

tggcttagag atacccatga ctttacacag ttcgtcggga 

<210> 209 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 209 

atgtagctct cgttgactgg agccacgatc ataccgatat 



40<210> 210 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 210 

accatgcggc gatgactgga atgaagaaac gggtattgtt 

10 

<210> 211 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 211 

2 0ttcagcacag atactaacga cgtcgttcat cttgtagcca 

<210> 212 
<211> 40 
■ <212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 212 

caattgtgga gggactgagc cagcaagacg gttgcctcaa 

<210> 213 
<211> 40 
35<212> DNA 
• <213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 213 

aaaactcctt gtagctcaaa gattcatcgc cgaccacatc 

<21:0> 21^ - ~- 

5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 214 

gaccaaggct tgaggcaaat gagagtgctt gcggagagca 

15<210> 215 
<211> 40 

<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 215 

cgaaacagca tttcgccggc agtcaaatcc tccaaaggat 

25 

<210> 216 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 216 

35ggagaggctc agggccatag atgacatttt tctcacgctt 

<2X0> 217 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

<400> 217 ~ 
Scatcatggga tcctgtttcc tgtgtgaaat tgttatccgc 

<210> 218 

<211> 542 

<212> PRT 

10<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic lucif erase 



15<400> 218 

Met Met Lys Arg 
1 

Pro Leu Glu Asp 
20 

2 0Ijys His Ser His 
35 

Ser Leu Ser Tyr 
50 

Ser Leu His Asn 
2565 

Ala Glu Asn Asn 

He Gly Met He 
100 

SOLeu Cys Lys Val 
115 

Lys Asn He Leu 
130 

He Lys Arg He 
35145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

4 0Leu Cys Ser Ser 



Glu Lys Asn Val 
5 

Leu Thr Ala Gly 

Leu Pro Gin Ala 
40 

Lys Glu Phe Phe 
55 

Cys Gly Tyr Lys 
70 

Thr Arg Phe Phe 

85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 

150 

Asn Phe He Ser 
165 

Leu His Phe Asp 
Gly Thr Thr Gly 



He Tyr Gly Pro 
10 

Glu Met Leu Phe 

25 

Leu Val Asp Val 

Glu Ala Thr Val 
60 

Met Asn Asp Val 
75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 



Glu Pro Leu His 
15 

Arg Ala Leu Arg 
30 

Val Gly Asp Glu 
45 

Leu Leu Ala Gin 

Val Ser He Cys 
80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 ' 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
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195 200 205 

His Gin Asn lie Cys Val Arg Leu lie His Ala Leu Asp Pro Arg Tyr 
210 215 220 

— Gly-Thr Gln- Leu" lie" Pro-Gly-Val-Thr Val— Leu" Val-Tyr "Leu -Pro-Phe— 
5225 230 235 240 

Phe His Ala Phe Gly Phe His lie Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val lie Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
260 265 270 

lOAla lie Gin Asp Tyr Glu Val Arg Ser Val lie Asn Val Pro Ser Val 
275 280 285 

lie Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
15305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly lie Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 
340 345 350 

20G1U Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
25385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
420 425 430 

3 0Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 
435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
35465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
500 505 510 

4 oval Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 
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515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

5<210> 219 
<211> 54-2' • 
<212> PRT 

<213> Artificial Sequence 
10<220> 

<223> Sequence of a synthetic lucif erase 
<400> 219 

Met Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 
15 1 5 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 . 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 40 45 

2 0Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val lie Ala Ala Trp Tyr 
25 85 90 95 

lie Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
115 120 125 

30Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
35 165 170' 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
195 200 205 

4 0His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 
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210 215 220 

Gly Thr Gin Leu lie Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe -His Ala-Phe Gly- Phe His-Ile Thr-Leu Gly-Tyr Phe-Met Va-1— Gly 
.5 245 250 255 

Leu Arg Val lie Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala lie Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 
275 280 285 

lOIle Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 
15 325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
355 360 365 

2 0Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 
25 405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 
435 ' 440 445 

3 0Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 
35 485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 
515 520 525 

4 0Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
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530 535 540 



<210> 220 
<211>~542 " ~ 
5<212> PRT 
<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic luciferase 

10 

<400> 220 

Met Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
15 20 25 30 

Lys His Ser Tyr Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 55 60 

2 0 Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
25 100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn -He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

30He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
35 180 185 190 

Leu cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr- 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 
210 215 220 

40Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
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225 



230 



235 



240 



Phe 



His 



Ala Phe Gly 



Phe 



His lie Thr Leu Gly 



Tyr Phe Met Val 



Gly 



245 



250 



255 



Leu Arg Val lie Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leii 'Lys~ 

; 260 ^ 265 270 

Ala lie Gin Asp Tyr Glu Val Arg Ser Val lie Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 



lOSer Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 
15 340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
370 375 380 

2 0Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
25 420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
450 455 460 

30Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
35 500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 



290 



295 



300 



40 
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<210> 221 
<211> 542 
<212> PRT 

<213> Artificial Sequence 

5 

<220> 

<223> Sequence of a synthetic lucif erase 



<400> 221 

lOMet Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 
15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Piie Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
15 35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 - 80 

20Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
25 115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

3 0Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
35 195 200 205' 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

40Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 



wo 02/16944 



PCT/USOl/26566 



101 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
260 265 270 

" Ala He Gin" Asp Tyr Glu Val'Arg Ser Val He Asn Val Pro" Ser Val 
5 275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

lOAla Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu. Arg Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
15 355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

20Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 
25 435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

30Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 
35 515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 



<210> 222 
40<211> 542 
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<212> PRT 

<213> Artificial Sequence 

-<220> ' _ - - - - - 

5'<223> Sequence of a synthetic lucif erase 

<400> 222 

Met Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 
1 5 10" 15 

lOPro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
15 50 55 60. 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val lie Ala Ala Trp Tyr 
85 90 95 

2 0Ile Gly Met lie Val Ala Pro Val Asn Glu Ser Tyr lie Pro Asp Glu 

100 105 110 

Leu Cys. Lys Val Met Gly lie Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
25 130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
165 170 175 

3 0Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Set Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 
35 210 215 • 220 

Gly Thr Gin Leu He Pro Gly -Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 
245 250 255 

40Leu Arg Val lie Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
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Ala lie Gin Asp Tyr 
275 

-lie lieu Phe-Iieu-Ser 
5 290 

Ser Leu Arg Glu Leu 

305 

Ala Glu Val Ala Ala 

325 

lOPhe Gly Leu Thr Glu 
340 

Glu Phe Lys Ser Gly 
355 

Lys lie Ala Asp Arg 
15 370 

Gly Glu Leu Cys lie 
385 

Asn Val Glu Ala Thr 
405 

2 0Ser Gly Asp Phe Gly 

420 

Asp Arg Tyr Lys Glu 
435 

Ala Glu Leu Glu Glu 
25 450 

Ala Val Val Gly lie 

465 

Phe Val Val Lys Gin 
485 

3 0 Asp Tyr Leu Ala Glu 

500 

Val Arg Phe Val Asp 
515 

Arg Lys Glu Leu Leu 
35 530 



103 

265 

Glu Val Arg Ser Val lie 
280 

Lys Ser -Pro -Leu Val Asp 
295 

Cys Cys Gly Ala Ala Pro 
310 315 
Lys Arg Leu Asn Leu Pro 
330 

Ser Thr Ser Ala He He 
345 

Ser Leu Gly Arg Val Thr 
360 

Glu Thr Gly Lys Ala Leu 
375 

Lys Gly Pro Met Val Ser 
390 395 
Lys Glu Ala He Asp Asp 
410 

Tyr Tyr Asp Glu Asp Glu 
425 

Leu He Lys Tyr Lys Gly 
440 

He Leu Leu Lys Asn Pro 
455 

Pro Asp Leu Glu Ala Gly 
470 475 
Pro Gly Lys Glu He Thr 
490 

Arg Val Ser His Thr Lys 
505 

Ser He Pro Arg Asn Val 
520 

Lys Gin Leu Leu Glu Lys 
535 



270 

Asn Val Pro Ser Val 
285 

Lys- Tyr Asp Leu Ser — 
300 

Leu Ala Lys Glu Val 
320 

Gly lie Arg Cys Gly 

335 

Gin Ser Leu Gly Asp 
350 

Pro Leu Met Ala Ala 
365 

Gly Pro Asn Gin Val 
380 

Lys Gly Tyr Val Asn 

400 

Asp Gly Trp Leu His 
415 

His Phe Tyr Val Val 
430 

Ser Gin Val Ala Pro 
445 

Cys He Arg Asp Val 
460 

Glu Leu Pro Ser Ala 
480 

Ala Lys Glu Val Tyr 
495 

Tyr Leu Arg Gly Gly 
510 

Thr Gly Lys He Thr 

525 
Ala Gly Gly 
540 



<210> 223 
<211> 542 
<212> PRT 
40<213> Artificial Sequence 
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<220> 

<223> Sequence of a synthetic lucif erase 

~<400>^"223 ' " 

5Met lie Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 
15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
10 35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

15Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val lie Ala Ala Trp Tyr 

85 90 95 

lie Gly Met lie Val Ala Pro Val Asn Glu Ser Tyr lie Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly lie Ser Lys Pro Gin He Val Phe Thr Thr 
20 115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

25Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
30 195 200 205 

His Gin Asn' He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

35Phe His Ala Phe 'Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 
40 275 280 285 
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lie Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 

305"~ 3^r0 3X5 a20~ 

5Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly lie Arg Cys Gly 
325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Thr Leu Gly Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
10 355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val iSer Lys Gly Tyr Val Asn 
385 390 395 400 

15Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 4i5 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 
20 435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

25Phe Val Val Lys Gin Pro Gly Thr Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr - Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 
30 515 520 525 

Arg Lys . Glu Leu Leu Lys Gin Leu Leu Val Lys Ala Gly Gly 
530 535 540 



<210> 224 
35<211> 311 
<212> PRT 
<213> Renilla 



renif ormis 



<400> 224 

40Met Thr Ser Lys Val Tyr Asp Pro Glu Gin Arg Lys Arg Met He Thr 



wo 02/16944 



PCT/USOl/26566 



106 



1 



5 



10 



15 



Gly Pro Gin Trp Trp Ala Arg Cys Lys Gin Met Asn Val Leu Asp Ser 

20 25 30 

Phe" lie Asn~Tyr Tyr^Asp Se'iT Glu Lys" His Ala Glu Asn Ala Val He 
; 35 40 45 

Phe Leu His Gly Asn Ala Ala Ser Ser Tyr Leu Trp Arg His Val Val 

50 55 60 

Pro His He Glu Pro Val Ala Arg Cys He He Pro Asp Leu He Gly 



lOMet Gly Lys Ser Gly Lys Ser Gly Asn Gly Ser Tyr Arg Leu Leu Asp 

85 90 95 

His Tyr Lys Tyr Leu Thr Ala Trp Phe Glu Leu Leu Asn Leu Pro Lys 

100 105 110 

Lys He He Phe Val Gly His Asp Trp Gly Ala Cys Leu Ala Phe His 
15 115 120 125 

Tyr Ser Tyr Glu His Gin Asp Lys He Lys Ala He Val His Ala Glu 

130 135 140 

Ser Val Val Asp Val He Glu Ser Trp Asp Glu Trp Pro Asp He Glu 
145 150 155 160 

20Glu Asp He Ala Leu He Lys Ser Glu Glu Gly Glu Lys Met Val Leu 

165 170 175 

Glu Asn Asn Phe Phe Val Glu Thr Met . Leu Pro Ser Lys He Met Arg 

180 185 190 

Lys Leu Glu Pro Glu Glu Phe Ala Ala Tyr Leu Glu Pro Phe Lys Glu 
25 195 200 205 

Lys Gly Glu Val Arg Arg Pro Thr Leu Ser Trp Pro Arg Glu "He Pro 

210 215 220 

Leu Val Lys Gly Gly Lys Pro Asp Val Val Gin He Val Arg Asn Tyr 
225 230 235 240 

3 0Asn Ala Tyr Leu Arg Ala Ser Asp Asp Leu Pro Lys Met Phe He Glu 

245 250 255 

Ser Asp Pro Gly Phe Phe Ser Asn Ala He Val Glu Gly Ala Lys Lys 

260 265 270 

Phe Pro Asn Thr Glu Phe Val Lys Val Lys Gly Leu His Phe Ser Gin 
• 35 275 280 . 285 

Glu Asp Ala Pro Asp Glu Met Gly Lys Tyr He Lys Ser Phe Val Glu 



65 



70 



75 



80 



290 



295 



300 



Arg 



Val Leu 



Lys Asn Glu Gin 



305 



310 



40 
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<210> 225 
<211> 311 
<212> PRT 



<213> Artificial Sequence 

5 

<220> 

<223> Sequence of a synthetic lucif erase 
<400> 225 

lOMet Ala Ser Lys Val Tyr Asp Pro Glu Gin Arg Lys Arg Met He Thr 
15 10 15 

Gly Pro Gin Trp Trp Ala Arg Cys Lys Gin Met Asn Val Leu Asp Ser 

20 25 30 

Phe lie Asn Tyr Tyr Asp Ser Glu Lys His Ala Glu Asn Ala Val He 
15 35 40 45 

Phe Leu His Gly Asn Ala Ala Ser Ser Tyr Leu Trp Arg His Val Val 

50 55 60 

Pro His He Glu Pro Val Ala Arg Cys He He Pro Asp Leu He Gly 
65 70 75 80 

2 0Met Gly Lys Ser Gly Lys Ser Gly Asn Gly Ser Tyr Arg Leu Leu Asp 

85 90 SS 

His Tyr Lys Tyr Leu Thr Ala Trp Phe Glu Leu Leu Asn Leu Pro Lys 

100 105 110 

Lys He He Phe Val Gly His Asp Trp Gly Ala Cys Leu Ala Phe His 
25 115 120 125 

Tyr Ser Tyr Glu His Gin Asp Lys He Lys Ala He Val His Ala Glu 

130 135 140 

Ser Val Val Asp Val He Glu Ser Trp Asp Glu Trp Pro Asp He Glu 
145 150 155 160 

3 0Glu Asp He Ala Leu He Lys Ser Glu Glu Gly Glu Lys Met Val Leu 

165 170 175 

Glu Asn Asn Phe Phe Val Glu Thr Met Leu Pro Ser Lys He Met Arg 

180 185 190 

Lys Leu Glu Pro Glu Glu Phe Ala Ala Tyr Leu Glu Pro Phe Lys Glu 
35 * 195 200 205 

-Lys Gly Glu Val Arg Arg Pro Thr Leu Ser Trp Pro Arg Glu .He Pro 

210 215 220 

Leu Val Lys Gly Gly Lys Pro Asp Val Val Gin He Val Arg Asn Tyr 
225 230 235 240 

40Asn Ala Tyr Leu Arg Ala Ser Asp Asp Leu Pro Lys Met Phe He Glu 
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245 250 255 

Ser Asp Pro Gly Phe Phe Ser Asn Ala lie Val Glu Gly Ala Lys Lys 

260 265 270 

Phe "Pro Asn Thr Glu Phe" Val Lys Val" Lys ~Gly Leu His Phe "Ser'Gln 
5 275 280 285 

Glu Asp Ala Pro Asp Glu Met Gly Lys Tyr lie Lys Ser Phe Val Glu 

290 295 300 

Arg Val Leu Lys Asn Glu Gin 
305 310 

10 

<210> 226 
<211> 311 
<212> PRT 

<213> Artificial Sequence 

15 

<220> 

<223> Sequence of a synthetic luciferase 
<400> 226 

20Met Ala Ser Lys Val Tyr Asp Pro Glu Gin Arg Lys Arg Met lie Thr 
15 10 15 

Gly Pro Gin Trp Trp Ala Arg Cys Lys Gin Met Asn Val Leu Asp Ser 

20 25 30 . 

Phe lie Asn Tyr Tyr Asp Ser Glu Lys His Ala Glu Asn Ala Val lie 
25 35 40 45 

Phe Leu His Gly Asn Ala Ala Ser Ser Tyr Leu Trp Arg His Val Val 

50 55 60 

Pro His lie Glu Pro Val Ala Arg Cys He He Pro Asp Leu He Gly 
65 70 75 80 

3 0Met Gly Lys Ser Gly Lys Ser Gly Asn Gly Ser Tyr Arg Leu Leu Asp 

85 90 95 

His Tyr Lys Tyr Leu Thr Ala Trp Phe Glu Leu Leu Asn Leu Pro Lys 

100 105 110 

Lys He lie Phe Val Gly His Asp Trp Gly Ala Cys Leu Ala Phe His 
35 115 120 125 

Tyr Ser Tyr Glu His Gin Asp Lys He Lys Ala He Val His Ala Glu 

130 135 140 

Ser Val Val Asp Val He Glu Ser Trp Asp Glu Trp Pro Asp He Glu 
145 150 155 160 

40G1U Asp He Ala Leu He Lys Ser Glu Glu Gly Glu Lys Met Val Leu 
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165 170 175 

Glu Asn Asn Phe Phe Val Glu Thr Met Leu Pro Ser Lys lie Met Arg 

180 185 190 

-Lys- Leu -Glu- Pro- Glu- Glu- Phe-Ala--Ala- Tyr-Leu-Glu- Pro-Phe- Lys-Glu - 



5 195 200 205 

Lys Gly Glu Val Arg Arg Pro Thr Leu Ser Trp Pro Arg Glu lie Pro 

210- ^ 215 220 

Leu Val Lys Gly Gly Lys Pro Asp Val Val Gin lie Val Arg Asn Tyr 
225 230 235 .240 

lOAsn Ala Tyr Leu Arg Ala Ser Asp Asp Leu Pro Lys Met Phe lie Glu 

245 250 255 

Ser Asp Pro Gly Phe Phe Ser Asn Ala He Val Glu Gly Ala Lys Lys 

260 265 270 

Phe, Pro Asn Thr Glu Phe Val Lys Val Lys Gly Leu His Phe Ser Gin 
15 275 280 285 

Glu Asp Ala Pro Asp Glu Met Gly Lys Tyr He Lys Ser Phe Val Glu 

290 295 300 

Arg Val Leu Lys Asn Glu Gin 
305 310 

20 

<210> 227 
<211> 311 
<212> PRT 

<213> Artificial Sequence 

25 

<220> 

<223> Sequence of a synthetic lucif erase 
<400> 227 

30Met Ala Ser Lys Val Tyr Asp Pro Glu Gin Arg Lys Arg Met He Thr 
1 5 . 10 15 

Gly Pro Gin Trp Trp Ala Arg Cys Lys Gin Met Asn Val Leu Asp Ser 

20 25 30 

Phe He Asn Tyr Tyr Asp Ser Glu Lys His Ala Glu Asn Ala Val He 
35 35 40 45 

Phe Leu His Gly Asn Ala Ala Ser Ser Tyr Leu Trp Arg His Val Val 

50 55 60 

Pro His He Glu Pro Val Ala Arg Cys He He Pro Asp Leu He Gly 
65 70 75 80 

40Met Gly Lys Ser Gly Lys Ser Gly Asn Gly Ser Tyr Arg Leu Leu Asp 
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85 90 95 

His Tyr Lys Tyr Leu Thr Ala Trp Phe Glu Leu Leu Asn Leu Pro Lys 
100 105 110 

• Lys" He He Phe"VaI Gly Hi's A'sp Trp " Gly Ala Cys.'Leu'Ala Phe His 
5 115 120 125 

Tyr Ser Tyr Glu His Gin Asp Lys lie Lys Ala He Val His Ala Glu 

130 135 140 

Ser Val Val Asp Val He Glu Ser Trp Asp Glu Trp Pro Asp He Glu 
145 150 155 160 

lOGlu Asp He Ala Leu He Lys Ser Glu Glu Gly Glu Lys Met Val Leu 

165 170 175 

Glu Asn Asn Phe Phe Val Glu Thr Met Leu Pro Ser Lys He Met Arg 

180 185 190 

Lys Leu Glu Pro Glu Glu Phe Ala Ala Tyr Leu Glu Pro Phe Lys Glu 
15 195 200 205 

Lys Gly Glu Val Arg Arg Pro Thr Leu Ser Trp Pro Arg Glu He Pro 

210 215 220 

Leu Val Lys Gly Gly Lys Pro Asp Val Val Gin He Val Arg Asn Tyr 
225 230 235 240 

20Asn Ala Tyr Leu Arg Ala Ser Asp Asp Leu. Pro Lys Met Phe He Glu 

245 250 255 

Ser Asp Pro Gly Phe Phe Ser Asn Ala He Val Glu Gly Ala Lys Lys 

260 2.65 270 

Phe Pro Asn Thr Glu Phe Val Lys Val Lys Gly Leu His Phe Ser Gin 
25 275 280 285 

Glu Asp Ala Pro Asp Glu Met Gly Lys Tyr He Lys Ser Phe Val Glu 

290 295 300 

Arg Val Leu Lys Asn Glu Gin 
305 310 

30 

<210> 228 
<211> 14 
<212> DNA 

<213> Artificial Sequence 

35 

<220> 

<223> A consensus sequence 

<221> tnis cofeature 
40<222> (1) . . . (14) 



wo 02/16944 



PCT/USOl/26566 



111 

<223> n = A,TfC or G 
<400> 228 

yggmnnnnng— ccaa 

5 

<210> 229 
<211> 38 
<212> DNA 

<213> Artificial Sec[uence 

10 

<220> 

<223> A primer 
<400> 229 

IBgtactgagac gacgccagcc caagcttagg cctgagtg 

230 
3 8 
DNA 

Artificial Sequence 
<220> . 

<223> A primer 

25<400> 230 

ggcatgagcg tgaactgact gaactagcgg ccgccgag 

<210> 231 
<211> 24 
30<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A primer 

35 

<400> 231 

ggatcccatg gtgaagcgtg agaa 

<210> 232 
40<211> 21 



<210> 
<211> 
<212> 
20<213> 
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<212> DNA 

<213> Artificial Sequence 

<220>^ 

5<223> A primer 

<400> 232 

ggatcccatg gtgaaacgcg a 

10<210> 233 
<211> 31 
<212> DNA 

<213> Artificial Sequence 

15<220> 

<223> A primer 

<400> 233 

ctagcttttt tttctagata atcatgaaga c 

20 

<210> 234 
<211>. 54 
<212> DNA 

<213> Artificial Sequence 

25 

<220> 

<223> A primer 
<400> 234 

3 0caaaaagctt ggcattccgg tactgttggt aaagccacca tggtgaagcg agag 

<210> 235 
<211> 26 
<212> DNA 
35<213> Artificial Sequence 

<220> 

<223> A primer 
40<400> 235 
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caattgttgt tgttaacttg tttatt 
<210> 236 

— <24i>--40 

5<212> DNA 
<213> Artificial Sequence 

<220> 

<223> A primer 

10 

<400> 236 

aaccatggct tccaaggtgt acgaccccga gcaacgcaaa 

<210> 237 

15<211> 40 

<212> DNA 

<213> Artificial Sequence 

<220> 
20<223> A primer 

<400> 237 

gctctagaat tactgctcgt tcttcagcac gcgctccacg 

25<210> 238 

<211> 31 ' ' • 

<212> DNA 

<213> Artificial Sequence 

30<220> 

<223> A primer 

<400> 238 

cgctagccat ggcttcgaaa gtttatgatc c 

35 

<210> 239 
<211> 25 
<212> DNA 

<213> Artificial Sequence 

40 
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<220> 

<223> A primer 

<4 00> 23 9 
Sggccagtaac tctagaatta ttgtt 

<210> 240 
<211> 5 
<212> DNA 
10<213> Artificial Seguence 

<220> 

<223> An oligonucleotide 

15<400> 240 
tataa 

<210> 241 
<211> 6 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<2 23> An oligonucleotide 

25 

<400> 241 
stratg 

<210> 242 
30<211> 9 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 

<221> mis cofeature 
<222> (1) . - . (9) 
<223> n A,T,C or G 

40 



wo 02/16944 



PCTAJSOl/26566 



115 



<400> 242 
mttncnnma 



_.<210>--243_ 
5<211> 5 



<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 

<400> 243 
tratg 

15<210> 244 
<211> 7 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> A consensus sequence 

<400> 244 
tgastma 

25 

<210> 245 
<211> 14 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> A consensus sequence 

<22l> mis cofeature 
35<222> (1) . . . (14) 

<223> n = A,T,C or G 



<400> 245 
yggnmnnnng ccaa 
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<210> 246 
<211> 40 
<212> DNA 

<2~13> Artificial Sequence 

5 

<220> 

<223> An oligonucleotide 
<400> 246 

lOaaccatggct tccaaggtgt acgaccccga gcaacgcaaa 4 0 

<210> 247 
<211> 40 
<212> DNA 
15<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
20<400> 247 

cgcatgatca ctgggcctca gtggtgggct cgctgcaagc 40 

<210> 248 
<211> 40 
25<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30 

<400> 248 

aaatgaacgt gctggactcc ttcatcaact actatgattc 40 
<210> 249 

35<211> 50 • . 

<212> DNA 

<213> Artificial Secjuence 



<220> 

40<223> An oligonucleotide 
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<400> 249 

cgagaagcac gccgagaacg ccgtgatttt tctgcatggt aacgctgcct 50 



— <21-0>-250 

5<211> 40 
<212> DNA 

<213> Artificial Sequence 



<220> 

10<223> An oligonucleotide 
<400> 250 

ccagctacct gtggaggcac gtcgtgcctc acatcgagcc 40 

15<210> 251 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 251 

cgtggctaga tgcatcatcc ctgatctgat cggaatgggt 40 

25 

<210> 252 

<211> 40 . . 

<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 252 

35aagtccggca agagcgggaa tggctcatat cgcctcctgg 40 

<210> 253 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

- -<400>-253^ " - - " 
Satcactacaa gtacctcacc gcttggttcg agctgctgaa 

<210> 254 

<211> 40 

<212> DNA 

10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 254 

'Ccttccaaag aaaatcatct ttgtgggcca cgactggggg 

<210> 255 

<211> 4.0 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 255 

gcttgtctgg cctttcacta ctcctacgag caccaagaca 

<210> 256 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An 'oligonucleotide 

<400> 256 

agatcaaggc catcgtccat gctgagagtg tcgtggacgt 



40<210> 257 
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<211> 45 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 257 

gatcgagtcc tgggacgagt ggcctgacat cgaggaggat atcgc 

10 

<210> 258 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<2 23> An oligonucleotide 
<400> 258 

20cctgatcaag agcgaagagg gcgagaaaat ggtgcttgag 

<210> 259 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 259 

aataacttct tcgtcgagac catgctccca agcaagatca 

<210> 260 
<211> 45 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 260 

tgcggaaact ggagcctgag gagttcgctg cctacctgga gccat 

■ <210> 261 '- ~ " " 

5<211> 40 
<212> DNA 

<213> Artificial Seqpience 
<220> 

10<223> An oligonucleotide 
<400> 261 

tcaaggagaa gggcgaggtt agacggccta ccctctcctg 

15<210> 262 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 262 

gcctcgcgag atccctctcg ttaagggagg caagcccgac 

25 

<210> 263 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 263 

35gtcgtccaga ttgtccgcaa ctacaacgcc taccttcggg 

<210> 264 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 



wo 02/16944 



PCT/USOl/26566 



121 



<220> 



<223> An oligonucleotide 



<400>_264 




Sccagcgacga tctgcctaag atgttcatcg agtccgaccc 



40 



<210> 265 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 265 

tgggttcttt tccaacgcta ttgtcgaggg agctaagaag 40 

<210> 266 

<211> 40 

20<212> DNA 

<213> A^rtificial Sequence 

<220> 

<223> An oligonucleotide 

25 ! 
<400> 266 

ttccctaaca ccgagttcgt gaaggtgaag ggcctccact 40 

<210> 267 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 

<220> 

.35<223> An oligonucleotide 
<400> 267 

tcagccagga ggacgctcca gatgaaatgg gtaagtacat 40 



40<210> 268 
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<211> 49 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 268 

caagagcttc gtggagcgcg tgctgaagaa cgagcagtaa ttctagagc 

10 

<210> 269 
<211> 29 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 

<400> 269 
2 0gctctagaat tactgctcgt tcttcagca 

<210> 270 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 270 

cgcgctccac gaagctcttg atgtacttac ccatttcatc 

<210> 271 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 271 

tggagcgtcc tcctggctga agtggaggcc cttcaccttc 

— <-2-l-0 > 27-2 

5<211> 40 
<212> DNA 

<213> Artificial Secjuence 

<220> 

10<223> An oligonucleotide 

•v. 

<400> 272 

acgaactcgg tgttagggaa cttcttagct ccctcgacaa 

15<210> 273 
<211> 40 
<212> DNA 
- <213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 273 

tagcgttgga aaagaaccca gggtcggact cgatgaacat 

25 

<210> 274 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 274 

35cttaggcaga tcgtcgctgg cccgaaggta ggcgttgtag 

<210> 275 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

<4"00>~275 

Sttgcggacaa tctggacgac gtcgggcttg cctcccttaa 

<210> 276 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<40p> 276 

cgagagggat ctcgcgaggc caggagaggg taggccgtct 

<210> 277 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 277 

aacctcgccc ttctccttga atggctccag gtaggcagcg 

<210> 278 
30<211> 45 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 

<400> 278 

aactcctcag gctccagttt ccgcatgatc ttgcttggga gcatg 
40<210> 279 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 



5<220> 

<223> An oligonucleotide 
<400> 279 

gtctcgacga agaagttatt ctcaagcacc attttctcgc 40 
10 . . 
<210> 280 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 280 

20cctcttcgct- cttgatcagg gcgatatcct cctcgatgtc 40 

<210> 281 
<211> 43 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 281 

aggccactcg tcccaggact cgatcacgtc cacgacactc tea 43 

<210> 282 
<211> 42 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 282 

gcatggacga tggccttgat cttgtcttgg tgctcgtagg ag 

-<2-l-0 >- 2 8-3 

5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 283 

tagtgaaagg ccagacaagc cccccagtcg tggcccacaa 

15<210> 284 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 284 

agatgatttt ctttggaagg ttcagcagct cgaaccaagc 

25 

<210> 285 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 285 

35ggtgaggtac ttgtagtgat ccaggaggcg atatgagcca 

<210> 286 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 

<4Q0 > 286 

Sttcccgctct tgccggactt acccattccg atcagatcag 40 

<210> 287 

<211> 45 
<212> DNA 
10<213> Artificial Sequence 

<;220> 

<223> An oligonucleotide 
15<400> 287 

SSatgatgca tctagccacg ggctcgatgt gaggcacgac gtgcc 45 

<210> 288 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 288 

tccacaggta gctggaggca gcgttaccat gcagaaaaat 40 

<210> 289 
30<211> 45 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 289 

cacggcgttc tcggcgtgct tctcggaatc atagtagttg atgaa 45 



40<210> 290 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 290 

ggagtccagc acgttcattt gcttgcagcg agcccaccac 

xo 

<210> 291 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 

<400> 291 

20tgaggcccag tgatcatgcg tttgcgttgc tcggggtcgt 

<210> 292 
<211> 20 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 292 

apaccttgga agccatggtt 

<210> 293 
<211> 10 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A Kozak sequence 

40 
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<400> 293 
aaccatggct 

_ ^210.> 

5<211> 12 
<212> DNA 

<213> Artificial Secjuence 

.<220> 

10<223> An oligonucleotide 

<400> 294 
taattctaga gc 

15<210> 295 
<211> 32 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> A primer 

<400> 295 

gcgtagccat ggtaaagcgt gagaaaaatg tc 

25 

<210> 296 
<211> 33 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> A primer 
<400> 296 

35ccgactctag attactaacc gccggccttc acc 

<210> 297 
<211> 1626 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> Sequence of a synthetic lucif erase 



<400> 297 " - - - ~ " 

Satggtgaaac gcgaaaagaa cgtgatctac ggcccagaac cactgcatcc actggaagac 60 

ctcaccgctg gtgagatgct cttccgagca ctgcgtaaac atagtcacct ccctcaagca 120 

ctcgtggacg tcgtgggaga cgagagcctc tcctacaaag aatttttcga agctactgtg 180 

ctgttggccc aaagcctcca taattgtggg tacaaaatga acgatgtggt gagcatttgt 240 

gctgagaata acactcgctt ctttattcct gtaatcgctg cttggtacat cggcatgatt 3 00 

. lOgtcgcccctg tgaatgaatc ttacatccca gatgagctgt gtaaggttat gggtattagc 3 60 

aaacctcaaa tcgtctttac taccaaaaac atcttgaata aggtcttgga agtccagtct 42 0 

cgtactaact tcatcaaacg catcattatt ctggataccg tcgaaaacat ccacggctgt 480 

gagagcctcc ctaacttcat ctctcgttac agcgatggta atatcgctaa tttcaagccc 540 

ttgcattttg atccagtcga gcaagtggcc gctattttgt gctcctccgg caccactggt 600 

ISttgcctaaag gtgtcatgca gactcaccag aatatctgtg tgcgtttgat ccacgctctc 660 

gaccctcgtg tgggtactca attgatccct ggcgtgactg tgctggtgta tctgcctttc 720 

tttcacgcct ttggtttctc tattaccctg ggctatttca tggtcggctt gcgtgtcatc 780 

atgtttcgtc gcttcgacca agaagccttc ttgaaggcta ttcaagacta cgaggtgcgt 840 

tccgtgatca acgtcccttc agtcattttg ttcctgagca aatctccttt ggttgacaag 900 

20tatgatctga gcagcttgcg tgagctgtgc tgtggcgctg ctcctttggc caaagaagtg 960 

gccgaggtcg ctgctaagcg tctgaacctc cctggtatcc gctgcggttt tggtttgact 1020 

gagagcactt ctgctaacat ccatagcttg cgagacgagt ttaagtctgg tagcctgggt 1080 

cgcgtgactc ctcttatggc tgcaaagatc gccgaccgtg agaccggcaa agcactgggc 1140 

ccaaatcaag tcggtgaatt gtgtattaag ggccctatgg tctctaaagg ctacgtgaac 12 00 

25aatgtggagg ccactaaaga agccattgat gatgatggct ggctccatag cggcgacttc 1260 

ggttactatg atgaggacga acacttctat gtggtcgatc gctacaaaga attgattaag 132 0 

tacaaaggct ctcaagtcgc accagccgaa ctggaagaaa ttttgctgaa gaacccttgt 1380 

atccgcgacg tggccgtcgt gggtatccca gacttggaag ctggcgagtt gcctagcgcc 1440 

tttgtggtga aacaacccgg caaggagatc actgctaagg aggtctacga ctatttggcc 1500 

30gagcgcgtgt ctcacaccaa atatctgcgt ggcggcgtcc gcttcgtcga ttctattcca 1560 

cgcaacgtta ccggtaagat cactcgtaaa gagttgctga agcaactcct cgaaaaagct 1620 

ggcggc 1626 

<210> 298 
35<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

40<223> Sequence of a synthetic lucif erase 
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<400> 298 

Met Val Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
1 5 ' 10 15 

Pro_Leu_Glu- Asp-Iieu--.Thr-Ala- Gly-Glu Met Leu-Phe-Arg- A^^^ 

5 20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 55 60 

lOSer Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
15 100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

2 0Ile Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 

145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
25 180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Val 
210 215 220 

3 0Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 

225 230 235 240 

Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
35 260 . 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
290 295 300 

40Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
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305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu As n Leu Pro Gly lie Arg-Cys Gly 
325 330 335 



Ph^_Gly_Leu_Thr--.Glu-Ser— Thr— Sei^-Al-a— Asn— I-le-His 

5 340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys lie Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
370 375 380 

lOGly Glu Leu Cys lie . Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
15 420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
450 455 460 

20Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
25 500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

30 

<210> 299 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
35 * 

<220> 

<223> Sequence of a synthetic lucif erase 



<400> 299 

40atggtgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 



wo 02/16944 



PCT/USOl/26566 



133 



ttgactgccg 


gcgaaatgct 


gtttcgtgct 


ctccgcaagc 


actctcattt 


gcctcaagcc 


120 


ttggtcgatg 


tggtcggcga 


tgaatctttg 


agctacaagg 


agttttttga ggcaaccgtc 


180 


ttgctggctc 


agtccctcca 


caattgtggc 


tacaagatga 


acgacgtcgt 


tagtatctgt 


240 


_ gctgaaaaca 


atacc.cg.ttt 


-Cttcattcca- 


-gtcatcgccg- 


catggfeatat- 


cggtatgatc - 


- 300- 


Sgtggctccag 


tcaacgagag 


ctacattccc 


gacgaactgt gtaaagtcat gggtatctct 


360 


aagccacaga 


ttgtcttcac 


cactaagaat 


attctgaaca aagtcctgga agtccaaagc 


420 


cgcaccaact 


ttattaagcg 


tatcatcatc 


ttggacactg tggagaatat tcacggttgc 


480 


gaatctttgc 


ctaatttcat 


ctctcgctat 


tcagacggca 


acatcgcaaa 


ctttaaacca 


540 


ctccacttcg 


accctgtgga 


acaagttgca 


gccattctgt 


gtagcagcgg 


tactactgga 


600 


lOctcccaaagg 


gagtcatgca 


gacccatcaa 


aacatttgcg 


tgcgtctgat 


ccatgctctc 


660 


gatccacgct 


acggcactca 


gctgattcct 


ggtgtcaccg 


tcttggtcta 


cttgcctttc 


720 


ttccatgctt 


tcggctttca 


tattactttg 


ggttacttta 


tggtcggtct 


ccgcgtgatt 


780 


atgttccgcc 


gttttgatca 


ggaggctttc 


ttgaaagcca 


tccaagatta 


tgaagtccgc 


840 


agtgtcatca 


acgtgcctag 


cgtgatcctg 


tttttgtcta 


agagcccact 


cgtggacaag 


900 


IStacgacttgt 


cttcactgcg 


tgaattgtgt 


t cr c cicit Q c c cf 


ctccactggc 


taaQaagatc 


960 


gctgaagtgg 


ccgccaaac^ ^cttgaatctt 


ccagggattc 


gttgtggctt 


cggcctcacc 


1020 


gaatctacca 


gcgctattat 


tcagtctctc 


cgcgatgagt 


ttaagagcgg 


ctctttgggc 


1080 


cgtgtcactc 


cactcatggc 


tgctaagatc 


gctgatcgcg 


aaactggtaa 


ggctttgggc 


1140 


ccgaaccaag 


tgggcgagct 


gtgtatcaaa. 


ggccctatgg 


tgagcaaggg 


ttatgtcaat 


1200 


20aacgttgaag 


ctaccaagga 


ggccatcgac 


gacgacggct 


ggttgcattc 


tggtgatttt 


1260 


ggatattacg 


acgaagatga 


gcatttttac 


gtcgtggatc 


gttacaagga 


gctgatcaaa 


1320 




gccaggttgc 


tccagctgag 


ttggaggaga 


ttctgttgaa 


aaatccatgc 


1380 


attcgcgatg 


tcgctgtggt 


cggcattcct 


gatctggagg 


ccggcgaact 


gccttctgct 


1440 


ttcgttgtca 


agcagcctgg 


taaagaaatt 


accgccaaag 


aagtgtatga 


ttacctggct 


1500 


2 Sgaacgt gtga 


gccatactaa 


gtacttgcgt 


ggcggcgtgc 


gttttgttga 


ctccatccct 


1560 


cgtaacgtaa 


caggcaaaat 


tacccgcaag 


gagctgttga 


aacaattgtt 


ggagaaggcc 


1620 


ggcggt 












1626 



<210> 300 
30<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

3 5<223> Sequence of a synthetic lucif erase 
<400> 300 

Met Val Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
15 10 15 

40 Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 



wo 02/16944 



PCTAJSOl/26566 



134 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 40 45 

Ser Leu- Ser- Tyr- Lys Glu Phe Phe Glu Ala Thr Val Le"u LevT'Ala Gin 

5 50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 
85 90 95 

lOIle Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
15 130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
165 170 175 

20Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 
25 210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 
245 250 255 

30Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
260 265 *'270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
35 290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 
325 330 335 

40Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 
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340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
355 360 365 

_ Lys_J3^e_Ala _Asp_ Arg_Glu_Thr _Gly_Lys_Ala -Leu-Gly- Pro-Asn -Gln—Val- 

5 370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 
405 410 415 

10 Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
15 450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 
485 490 495 

20Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
25 530 535 540 

<210> 301 
<211> 1626 
<212> DNA 
30<213> Artificial Sequence 

.<220> 

<223> Sequence of a synthetic luciferase 
35<400> 301 

atggtaaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 
ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 120 
ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 
ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 240 
40gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 3 00 
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frt" r^rt t" o a 


L. d d d^ d^j 


UUdUd^ Uv.'wv.' 


aaccraactcrt 


gtaaagfccat 


aaortatct c t 


3'60 


cL ct^ L« O d w ct^ d 


L.U>^ L.^UU'L'd^^ 


U d^ CdCL^ nCl L> 


attctcraaca 

CL W W V>L 


aacrtcctcrcra 


agtccaaagc 


420 


^f^f^ O ^ Q s ^ ^ 


L> l«dL> L>dd^O^ 


L>d L«L.«dL-v^ClL^^ 


4- tacracar't'cr 


tggagaatat 


tcacggttgc 


480 


- — 3 d d c C trt <^ 


(,,>WddL.L.L.^d L. 


UL«^L>S_<^V^L«CLli.i 


"t caaaccrana 


acatcgcaaa 


"ctttaaacca 


~ 540" 


C ^ ^ 4* ^ ^^^Y 

OCuCCdCuCCg^ 


dccc ug uggd 


dUdd^ U uy Ud 


y ww>d u« L>y l-» 


y u&y Vrfoy wyy 


tactactQcra 


600 


cticcca,aag9 


gagtcatzgca 


gdCCCduCdd 


a a ^a ^ ^ ^ nf^n 
ddUd u u uy uy 


i"n/^rT4" r^4*cra t* 
uyuy uu wynu 


u> w c& u»y w w w i« v« 


660 




acggcacccd 


gc ugduuccu 


yy uy uuduuy 


uuuuyyi^^^d 


r" 4" "t* cr c c fc 1 1 c 


720 


X. 1- ^ ^ 4.— — VX. 

ucccac^cut. 


cggc t t c a 


udu udC uu ug 


rTfT4* 4* a t" ^ ^ a 
yy uuduuuud 


uyy uuyy uuu 


fr«cfecftaatt 


780 




gud.cgduCd 


ggaggct-uuc 


ttgaaagcca 


tccaagatta 


tgaagtccgc 


840 


XUd^U^uCctUCd 


^ r"^ ft t " «■« m ^ 4* s ^* 

dcgugccudg 


uguyduuuuy 


tttttgtcta 


agagcccact 


cgtggacaag 


900 


t acgac ttgt 


^ 4* 4~ ^ ^ ^ 

c uucac cgcy 


ugddu uy ug u 


tgcggtgccg 


ctccactggc 


taaggaggtc 


960 


gctgaagrgg 


^ ftf~* ^ ^ ^ f* f1 

CCgCCdddCjg 


r^4~^rTaat~r^+"t* 
u u uy dd uu u u 


ccagggattc 


gttgtggctt 


cggcctcacc 


102 0 


gaatctacca 


^ ^ 4* 4" Q 4^ 

gL.gcgdUL.du 


r^zA rra ^ 4~ 
UUdy dU. UU U U 


ggggatgagt 


ttaagagcgg 


ctctttgggc 


1080 


cgT-guCdCuC 


CdCuCduggc; 


4- rro 4* a a rra 4~ 

uguuddyduu. 


gctgatcgcg aaactggtaa ggctttgggc 


1140 


ISccgaaccaag 


tgggcgagct 


guguducaad 


ggccctatgg 


tgagcaaggg 


ttatgtcaat 


1200 


aacg'bt^gaag 


ctaccaagga 


ggccducgdu 


gacgacggct 


ggttgcattc 


tggtgatttt 


1260 


ggatiatitacg 


acgaaga'tga 


0^^^ A ^ 4* ^ 4* 4^ 

gcau uuu uac 


gtcgtggatc 


gttacaagga 


gctgatcaaa 


1320 


tacaagggta 


gccaggttgc 


tccagctgag 


ttggaggaga 


ttctgttgaa 


aaatccatgc 


T "a Q A 


attcgcgatg 


tcgctgtggt 


cggcattcct 


gatctggagg 


ccggcgaact 


gccttctgct 


1440 


20ttcgttgtca 


agcagcctgg 


tacagaaatt 


accgccaaag 


aagtgtatga 


ttacctggct 


1500 


gaacgtgtga 


gccatactaa 


gtacttgcgt 


ggcggcgtgc 


gttttgttga 


ctccatccct 


1560 


cgtaacgtaa 


caggcaaaat 


tacccgcaag 


gagctgttga' 


aacaattgtt 


ggtgaaggcc 


1620 


ggcggt 












1626 



25<210> 302 
<211> 542 
<212> PRT 

<213> Artificial Sequence 



30<220> 

<223> Sequence of a synthetic lucif erase 



<400> 302 

Met Val Lys Arg Glu Lys Asn Val 
35 1 5 

Pro Leu Glu Asp Leu Thr Ala Gly 
20 

Lys His Ser His Leu Pro Gin Ala 
35 40 
40 Ser Leu Ser Tyr Lys Glu Phe Phe 



lie Tyr Gly Pro Glu Pro Leu His 

10 15 
Glu Met Leu Phe Arg Ala Leu Arg 
25 30 
Leu Val Asp Val Val Gly Asp Glu 
45 

Glu Ala Thr Val Leu Leu Ala Gin 
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50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala ,Glu_ Asn-Asn-Thr Arg- Phe-Phe -I-le -Pro-Valr-Iie- Ala^Ala-Trp Tyr" 

5 85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
115 120 125 

lOLys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
15 165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
195 200 205 

2 0His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
22^ 230 235 240 

Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 
25 245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 
275 280 285 

3 0He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 * 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 
35 325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Thr Leu Gly- Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser- Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
355 360 365 

4 0Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
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Gly Glu Leu Cys_Ile 
385 

Asn Val Glu Ala Thr 
5 405 
Ser Gly Asp Phe Gly 
420 

Asp Arg Tyr Lys Glu 
435 

lOAla Glu Leu Glu Glu 
450 

Ala Val Val Gly lie 
465 

Phe Val Val Lys Gin 
15 485 
Asp Tyr Leu Ala Glu 
500 

Val Arg Phe Val Asp 
515 

20Arg Lys Glu Leu Leu 
530 



138 

375 

Lys_ Gly Pro Met Val 
390 

Lys Glu Ala He Asp 
410 

Tyr Tyr Asp Glu Asp 

425 

Leu He Lys Tyr Lys 
440 

He Leu Leu Lys Asn 
455 

Pro Asp Leu Glu Ala 
470 

Pro Gly Thr Glu He 
490 

Arg Val Ser His Thr 
505 

Ser He Pro Arg Asn 
520 

Lys Gin Leu Leu Val 
.535 



380 

Ser Lys. Gly .Tyr Val- Asn 
395 400 
Asp Asp Gly Trp Leu His 
415 

Glu His Phe Tyr Val Val 
430 

Gly Ser Gin Val Ala Pro 
445 

Pro Cys He Arg Asp Val 
460 

Gly Glu Leu Pro Ser Ala 
475 480 
Thr Ala Lys Glu Val Tyr 

495 

Lys Tyr Leu Arg Gly Gly 
510 

Val Thr Gly Lys He Thr 
525 

Lys Ala Gly Gly 
540 



